Re: Help with StopFilterFactory

2014-08-20 Thread heaven
 What release of Solr?
4.8.1.

 Do you have autoGeneratePhraseQueries=true on the field? 
No, the config I've provided is the exact.

 And when you said But any of these does, did you mean But NONE of these 
does?
Whoops, yes, fixed that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4153944.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with StopFilterFactory

2014-08-20 Thread heaven
From this page: http://wiki.apache.org/solr/SchemaXml
 autoGeneratePhraseQueries=true|false (in schema version 1.4 and later
 this now defaults to false)
Just checked, I've schema name=sunspot version=1.0 so this may be true
by default?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4153954.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with StopFilterFactory

2014-08-20 Thread heaven
Hello,

Yes, with schema version 1.5 all those examples that didn't work do work
now. But results also include records that match by  com, twitter, etc,
which is not desirable.

It seems we do need autoGeneratePhraseQueries=true but also need to ignore
blacklisted words. Is that somehow possible?

Best,
Alexader



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4153957.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unsupported ContentType: application/pdf Not in: [application/xml,​ text/csv,​ text/json,​ application/csv,​ application/javabin,​ text/xml,​ application/json]

2014-08-20 Thread Croci Francesco Luigi (ID SWS)
Hallo,

I have solr 4.9.0 and I’m getting the above error if I try to index a pdf 
document with the Solr Web-Interface.

Here is my schema and solrconfig. Do I miss something? :

?xml version=1.0 encoding=UTF-8 ?
schema name=simple version=1.1
types
   fieldtype name=string class=solr.StrField 
postingsFormat=SimpleText /
   fieldtype name=ignored class=solr.TextField 
/
   fieldtype name=text class=solr.TextField 
postingsFormat=SimpleText
   analyzer type=index
   tokenizer 
class=solr.StandardTokenizerFactory/
   filter 
class=solr.LowerCaseFilterFactory / !--Lowercases the letters in each 
token. Leaves non-letter tokens alone.--
   filter 
class=solr.TrimFilterFactory/ !--Trims whitespace at either end of a token. 
--
   filter 
class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ 
!--Discards common words.  --
   filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
   tokenizer 
class=solr.StandardTokenizerFactory/
   filter 
class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/
   filter 
class=solr.LowerCaseFilterFactory /
   filter 
class=solr.TrimFilterFactory/
   filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   /fieldtype
/types

fields
   field name=signatureField type=string 
indexed=true stored=true multiValued=false /
   dynamicField name=ignored_* type=ignored 
multiValued=true indexed=false stored=false /
   field name=id type=string indexed=true 
stored=true multiValued=false /
   field name=fullText type=text 
indexed=true multiValued=true /
/fields

defaultSearchFieldfullText/defaultSearchField

solrQueryParser defaultOperator=OR /
uniqueKeyid/uniqueKey
/schema



?xml version=1.0 encoding=UTF-8 ?
config
luceneMatchVersionLUCENE_45/luceneMatchVersion
directoryFactory name='DirectoryFactory' 
class='solr.MMapDirectoryFactory' /

codecFactory name=CodecFactory 
class=solr.SchemaCodecFactory /

!-- lib dir='${solr.core.instanceDir}/lib' / --
lib dir=${solr.core.instanceDir}/dist/ 
regex=solr-cell-\d.*\.jar /
lib dir=${solr.core.instanceDir}/contrib/extraction/lib 
regex=.*\.jar /
!-- lib dir=${solr.core.instanceDir}/dist/ 
regex=solr-langid-.*\.jar /
lib dir=${solr.core.instanceDir}/contrib/langid/lib/ /--

requestHandler name=standard 
class=solr.StandardRequestHandler default=true /

requestHandler name=/admin/ 
class=org.apache.solr.handler.admin.AdminHandlers /

requestHandler name=/admin/luke 
class=org.apache.solr.handler.admin.LukeRequestHandler /

requestHandler name=/update 
class=solr.UpdateRequestHandler
   lst name=defaults
   str 
name=update.chaindeduplication/str
   /lst
/requestHandler

requestHandler name=/update/extract 
class=solr.extraction.ExtractingRequestHandler
   lst name=defaults
   str 
name=captureAttrtrue/str
   str 
name=lowernamesfalse/str
   str name=overwritefalse/str
   str 
name=literalsOverridetrue/str
   str 
name=uprefixignored_/str
   str name=fmap.alink/str
   str 
name=fmap.contentfullText/str
   !-- the configuration here 
could be useful for tests --
   str 
name=update.chaindeduplication/str
   /lst
   

Re: Intermittent error indexing SolrCloud 4.7.0

2014-08-20 Thread Shawn Heisey
On 8/19/2014 7:23 PM, S.L wrote:
 I get No Live SolrServers available to handle this request error
 intermittently while indexing in a SolrCloud cluster with 3 shards and
 replication factor of 2.

 I am using Solr 4.7.0.

 Please see the stack trace below.

There's pretty much zero information to go on here.  I would suspect
performance problems leading to zookeeper client timeouts, which will
cause Solr to think servers are down when they're not.  The most likely
causes for performance issues (aside from simply trying to make too many
queries or simultaneous update requests) are:

* A java heap that's too small.
* A java heap that's very large, without proper GC tuning.
* Not enough RAM for the index size.

The apache wiki server is down at the moment, so this URL will not work
at the time I am writing.  When it works, it contains a larger
description of the problems mentioned above, and a few other ideas:

http://wiki.apache.org/solr/SolrPerformanceProblems

You can visit the google cache for that page, which I believe is
completely up to date:

http://webcache.googleusercontent.com/search?q=cache:dv-tzFUweQ8J:https://wiki.apache.org/solr/SolrPerformanceProblems+cd=1hl=enct=clnk

Thanks,
Shawn



RE: Spellchecking suggestions won't collate

2014-08-20 Thread Dyer, James
Because my is the 7th suggestion down the list, it is going to need more than 
30 tries to figure out the one that can give some hits.  You can increase 
maxCollationTries if you're willing to endure the performance penalty of 
trying so many replacement queries.  This case actually highlights why 
DirecrSpellChecker by default doesn't even bother with short words like this.

Rather than letting the spellchecker check words this small, possibly you can 
just scan the user's input and make any words 4 characters long to be 
optional?  Or even just use a mm below 100%? (65% ?)  I realize this will give 
you a small loss of precision but the recall will be better and you'll have to 
rely less on spellcheck.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 15, 2014 3:21 PM
To: Solr User List
Subject: Spellchecking suggestions won't collate

It must be Friday. I can't figure out why there is no collation value:

{
  responseHeader:{
status:0,
QTime:31,
params:{
  spellcheck:on,
  spellcheck.collateParam.qf:BUS_BUSINESS_NAME,
  spellcheck.maxResultsForSuggest:5,
  spellcheck.maxCollations:3,
  spellcheck.maxCollationTries:30,
  qf:BUS_BUSINESS_NAME_PHRASE,
  q.alt:*:*,
  spellcheck.collate:true,
  spellcheck.onlyMorePopular:false,
  defType:edismax,
  debugQuery:true,
  echoParams:all,
  spellcheck.count:10,
  spellcheck.alternativeTermCount:10,
  indent:true,
  q:Mi Next Promo,
  wt:json}},
  response:{numFound:0,start:0,maxScore:0.0,docs:[]
  },
  spellcheck:{
suggestions:[
  mi,{
numFound:10,
startOffset:0,
endOffset:2,
suggestion:[mr,
  mp,
  mid,
  mix,
  mb,
  mj,
  my,
  md,
  mc,
  ma]},
  next,{
numFound:3,
startOffset:3,
endOffset:7,
suggestion:[nest,
  news,
  neil]},
  promo,{
numFound:4,
startOffset:8,
endOffset:13,
suggestion:[photo,
  prime,
  pronto,
  prof]}]},

The actual business name is My Next Promo which I'm hoping would be the 
collation value.

Thanks,

Corey



Business Name Collation

2014-08-20 Thread Corey Gerhardt
Solr 4.8.1

Correct value: Wardell F E B Dr

Just wondering if anyone can see an issue with my spellchecker settings.  There 
is no collation value and I'm hoping that someone can explain why.

  lst name=spellchecker
  str 
name=classnameorg.apache.solr.spelling.DirectSolrSpellChecker/str
  str name=namedefault/str
  str name=fieldspell/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.8/float
  int name=maxEdits2/int
  int name=minPrefix1/int
  int name=maxInspections5/int
  int name=minQueryLength1/int
  float name=thresholdTokenFrequency0.0001/float
  float name=maxQueryFrequency0.01/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent

{
  responseHeader:{
status:0,
QTime:0,
params:{
  mm:100%,
  spellcheck:true,
  sort:score desc,BUS_PRIORITY_RANK desc,BUS_CITY_EXACT 
asc,BUS_BUSINESS_NAME_SORT asc,
  spellcheck.maxResultsForSuggest:5,
  spellcheck.collateParam.qf:BUS_BUSINESS_NAME,
  spellcheck.maxCollations:3,
  spellcheck.maxCollationTries:30,
  qf:BUS_BUSINESS_NAME_PHRASE,
  q.alt:*:*,
  spellcheck.collate:true,
  spellcheck.onlyMorePopular:false,
  defType:edismax,
  debugQuery:true,
  echoParams:all,
  
fl:BUS_IS_TOLL_FREE,score,id,BUS_BUSINESS_NAME,BUS_SERVICE_AREA,BUS_CITY,BUS_PRIORITY_RANK,BUS_LISTING_ID,BUS_DW_CUST_ID,
  spellcheck.accuracy:0.8,
  debug.explain.structured:true,
  spellcheck.count:10,
  spellcheck.alternativeTermCount:10,
  spellcheck:true,
  spellcheck.accuracy:0.8,
  indent:true,
  q:Wardel F E B Dr,
  spellcheck.collate:true,
  wt:json}},
  response:{numFound:0,start:0,maxScore:0.0,docs:[]
  },
  spellcheck:{
suggestions:[
  wardel,{
numFound:1,
startOffset:0,
endOffset:6,
suggestion:[wardell]}]},
  debug:{
rawquerystring:Wardel F E B Dr,
querystring:Wardel F E B Dr,
parsedquery:(+((DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:wardel)) 
DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:f)) 
DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:e)) 
DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:b)) 
DisjunctionMaxQuery(((BUS_BUSINESS_NAME_PHRASE:dr 
BUS_BUSINESS_NAME_PHRASE:doctor~5))/no_coord,
parsedquery_toString:+(((BUS_BUSINESS_NAME_PHRASE:wardel) 
(BUS_BUSINESS_NAME_PHRASE:f) (BUS_BUSINESS_NAME_PHRASE:e) 
(BUS_BUSINESS_NAME_PHRASE:b) ((BUS_BUSINESS_NAME_PHRASE:dr 
BUS_BUSINESS_NAME_PHRASE:doctor)))~5),
explain:{},
QParser:ExtendedDismaxQParser,
altquerystring:null,
boost_queries:null,
parsed_boost_queries:[],
boostfuncs:null,

Thanks,

Corey


RE: Spellchecking suggestions won't collate

2014-08-20 Thread Corey Gerhardt
I'm working with business names which are even sometimes people names such as  
Wardell F E B Dr .  I suspect I need to change my logic to not try to rely on 
spellchecking so much as you suggest.

Thanks.

Corey

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: August-20-14 9:37 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellchecking suggestions won't collate

Because my is the 7th suggestion down the list, it is going to need more than 
30 tries to figure out the one that can give some hits.  You can increase 
maxCollationTries if you're willing to endure the performance penalty of 
trying so many replacement queries.  This case actually highlights why 
DirecrSpellChecker by default doesn't even bother with short words like this.

Rather than letting the spellchecker check words this small, possibly you can 
just scan the user's input and make any words 4 characters long to be 
optional?  Or even just use a mm below 100%? (65% ?)  I realize this will give 
you a small loss of precision but the recall will be better and you'll have to 
rely less on spellcheck.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 15, 2014 3:21 PM
To: Solr User List
Subject: Spellchecking suggestions won't collate

It must be Friday. I can't figure out why there is no collation value:

{
  responseHeader:{
status:0,
QTime:31,
params:{
  spellcheck:on,
  spellcheck.collateParam.qf:BUS_BUSINESS_NAME,
  spellcheck.maxResultsForSuggest:5,
  spellcheck.maxCollations:3,
  spellcheck.maxCollationTries:30,
  qf:BUS_BUSINESS_NAME_PHRASE,
  q.alt:*:*,
  spellcheck.collate:true,
  spellcheck.onlyMorePopular:false,
  defType:edismax,
  debugQuery:true,
  echoParams:all,
  spellcheck.count:10,
  spellcheck.alternativeTermCount:10,
  indent:true,
  q:Mi Next Promo,
  wt:json}},
  response:{numFound:0,start:0,maxScore:0.0,docs:[]
  },
  spellcheck:{
suggestions:[
  mi,{
numFound:10,
startOffset:0,
endOffset:2,
suggestion:[mr,
  mp,
  mid,
  mix,
  mb,
  mj,
  my,
  md,
  mc,
  ma]},
  next,{
numFound:3,
startOffset:3,
endOffset:7,
suggestion:[nest,
  news,
  neil]},
  promo,{
numFound:4,
startOffset:8,
endOffset:13,
suggestion:[photo,
  prime,
  pronto,
  prof]}]},

The actual business name is My Next Promo which I'm hoping would be the 
collation value.

Thanks,

Corey



Re: Apache Solr Wiki

2014-08-20 Thread Julie . Voss
New to Solr and looking at an Endeca to Solr/hybris implementation. Is 
there anything available about migrating existing rules from endeca to 
solr/hybris? So far I haven't seen anything.

Thank you!



From:   Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Date:   08/19/2014 05:16 PM
Subject:Re: Apache Solr Wiki



Done, have fun!

On Tue, Aug 19, 2014 at 10:07 AM,  julie.v...@anixter.com wrote:
 user name: julievoss



 From:   Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Date:   08/19/2014 10:34 AM
 Subject:Re: Apache Solr Wiki



 Julie:

 bq: Can I also have access to the wiki?

 Sure. Sou need to create a Wiki logon and let us
 know what that is before we can add you to the list.

 Best,
 Erick

 On Tue, Aug 19, 2014 at 6:54 AM,  julie.v...@anixter.com wrote:
 Can I also have access to the wiki? We are at the outset of a
 Solr/Hybris
 implementation.



 From:   Mark Sun mark...@motionelements.com
 To: solr-user@lucene.apache.org
 Date:   08/18/2014 08:06 PM
 Subject:Apache Solr Wiki



 Dear Solr Wiki admin,

 We are using Solr for our multilingual asian language keywords search,
 as
 well as visual similarity search engine (via pixolution plugin). We
 would
 like to update the Powered by Solr section. As well as help to add on 
to
 the knowledge base for other Solr setups.

 Can you add me, username MarkSun as a contributor to the wiki?

 Thank you!

 Cheers,
 Mark Sun
 CTO

 MotionElements Pte Ltd
 190 Middle Road, #10-05 Fortune Centre
 Singapore 188979
 mark...@motionelements.com

 www.motionelements.com
 =
 Asia-inspired Stock Animation | Video Footage l AE Template online
 marketplace
 =
 This message may contain confidential and/or privileged information. If
 you are not the addressee or authorized to receive this for the
 addressee,
 you must not use, copy, disclose or take any action based on this
 message
 or any information herein. If you have received this message in error,
 please advise the sender immediately by reply e-mail and delete this
 message.  Thank you for your cooperation.





RE: Business Name Collation

2014-08-20 Thread Corey Gerhardt
I'm going to reply to my own question.  After recalling a previous email from 
James Dyer, I know the answer. 

-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: August-20-14 9:54 AM
To: Solr User List
Subject: Business Name Collation

Solr 4.8.1

Correct value: Wardell F E B Dr

Just wondering if anyone can see an issue with my spellchecker settings.  There 
is no collation value and I'm hoping that someone can explain why.

  lst name=spellchecker
  str 
name=classnameorg.apache.solr.spelling.DirectSolrSpellChecker/str
  str name=namedefault/str
  str name=fieldspell/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.8/float
  int name=maxEdits2/int
  int name=minPrefix1/int
  int name=maxInspections5/int
  int name=minQueryLength1/int
  float name=thresholdTokenFrequency0.0001/float
  float name=maxQueryFrequency0.01/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent

{
  responseHeader:{
status:0,
QTime:0,
params:{
  mm:100%,
  spellcheck:true,
  sort:score desc,BUS_PRIORITY_RANK desc,BUS_CITY_EXACT 
asc,BUS_BUSINESS_NAME_SORT asc,
  spellcheck.maxResultsForSuggest:5,
  spellcheck.collateParam.qf:BUS_BUSINESS_NAME,
  spellcheck.maxCollations:3,
  spellcheck.maxCollationTries:30,
  qf:BUS_BUSINESS_NAME_PHRASE,
  q.alt:*:*,
  spellcheck.collate:true,
  spellcheck.onlyMorePopular:false,
  defType:edismax,
  debugQuery:true,
  echoParams:all,
  
fl:BUS_IS_TOLL_FREE,score,id,BUS_BUSINESS_NAME,BUS_SERVICE_AREA,BUS_CITY,BUS_PRIORITY_RANK,BUS_LISTING_ID,BUS_DW_CUST_ID,
  spellcheck.accuracy:0.8,
  debug.explain.structured:true,
  spellcheck.count:10,
  spellcheck.alternativeTermCount:10,
  spellcheck:true,
  spellcheck.accuracy:0.8,
  indent:true,
  q:Wardel F E B Dr,
  spellcheck.collate:true,
  wt:json}},
  response:{numFound:0,start:0,maxScore:0.0,docs:[]
  },
  spellcheck:{
suggestions:[
  wardel,{
numFound:1,
startOffset:0,
endOffset:6,
suggestion:[wardell]}]},
  debug:{
rawquerystring:Wardel F E B Dr,
querystring:Wardel F E B Dr,
parsedquery:(+((DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:wardel)) 
DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:f)) 
DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:e)) 
DisjunctionMaxQuery((BUS_BUSINESS_NAME_PHRASE:b)) 
DisjunctionMaxQuery(((BUS_BUSINESS_NAME_PHRASE:dr 
BUS_BUSINESS_NAME_PHRASE:doctor~5))/no_coord,
parsedquery_toString:+(((BUS_BUSINESS_NAME_PHRASE:wardel) 
(BUS_BUSINESS_NAME_PHRASE:f) (BUS_BUSINESS_NAME_PHRASE:e) 
(BUS_BUSINESS_NAME_PHRASE:b) ((BUS_BUSINESS_NAME_PHRASE:dr 
BUS_BUSINESS_NAME_PHRASE:doctor)))~5),
explain:{},
QParser:ExtendedDismaxQParser,
altquerystring:null,
boost_queries:null,
parsed_boost_queries:[],
boostfuncs:null,

Thanks,

Corey


Re: Unsupported ContentType: application/pdf Not in: [application/xml,​ text/csv,​ text/json,​ application/csv,​ application/javabin,​ text/xml,​ application/json]

2014-08-20 Thread Erik Hatcher
You need to change the handler to /update/extract - the handler that accepts 
“rich documents”, whereas /update only handles the types it mentions in the 
error message.

Erik

On Aug 20, 2014, at 9:34 AM, Croci Francesco Luigi (ID SWS) fcr...@id.ethz.ch 
wrote:

 Hallo,
 
 I have solr 4.9.0 and I’m getting the above error if I try to index a pdf 
 document with the Solr Web-Interface.
 
 Here is my schema and solrconfig. Do I miss something? :
 
 ?xml version=1.0 encoding=UTF-8 ?
 schema name=simple version=1.1
types
   fieldtype name=string class=solr.StrField 
 postingsFormat=SimpleText /
   fieldtype name=ignored 
 class=solr.TextField /
   fieldtype name=text class=solr.TextField 
 postingsFormat=SimpleText
   analyzer type=index
   tokenizer 
 class=solr.StandardTokenizerFactory/
   filter 
 class=solr.LowerCaseFilterFactory / !--Lowercases the letters in each 
 token. Leaves non-letter tokens alone.--
   filter 
 class=solr.TrimFilterFactory/ !--Trims whitespace at either end of a 
 token. --
   filter 
 class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ 
 !--Discards common words.  --
   filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
   tokenizer 
 class=solr.StandardTokenizerFactory/
   filter 
 class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/
   filter 
 class=solr.LowerCaseFilterFactory /
   filter 
 class=solr.TrimFilterFactory/
   filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   /fieldtype
/types
 
fields
   field name=signatureField type=string 
 indexed=true stored=true multiValued=false /
   dynamicField name=ignored_* type=ignored 
 multiValued=true indexed=false stored=false /
   field name=id type=string indexed=true 
 stored=true multiValued=false /
   field name=fullText type=text 
 indexed=true multiValued=true /
/fields
 
defaultSearchFieldfullText/defaultSearchField
 
solrQueryParser defaultOperator=OR /
uniqueKeyid/uniqueKey
 /schema
 
 
 
 ?xml version=1.0 encoding=UTF-8 ?
 config
luceneMatchVersionLUCENE_45/luceneMatchVersion
directoryFactory name='DirectoryFactory' 
 class='solr.MMapDirectoryFactory' /
 
codecFactory name=CodecFactory 
 class=solr.SchemaCodecFactory /
 
!-- lib dir='${solr.core.instanceDir}/lib' / --
lib dir=${solr.core.instanceDir}/dist/ 
 regex=solr-cell-\d.*\.jar /
lib dir=${solr.core.instanceDir}/contrib/extraction/lib 
 regex=.*\.jar /
!-- lib dir=${solr.core.instanceDir}/dist/ 
 regex=solr-langid-.*\.jar /
lib dir=${solr.core.instanceDir}/contrib/langid/lib/ /--
 
requestHandler name=standard 
 class=solr.StandardRequestHandler default=true /
 
requestHandler name=/admin/ 
 class=org.apache.solr.handler.admin.AdminHandlers /
 
requestHandler name=/admin/luke 
 class=org.apache.solr.handler.admin.LukeRequestHandler /
 
requestHandler name=/update 
 class=solr.UpdateRequestHandler
   lst name=defaults
   str 
 name=update.chaindeduplication/str
   /lst
/requestHandler
 
requestHandler name=/update/extract 
 class=solr.extraction.ExtractingRequestHandler
   lst name=defaults
   str 
 name=captureAttrtrue/str
   str 
 name=lowernamesfalse/str
   str 
 name=overwritefalse/str
   str 
 name=literalsOverridetrue/str
   str 
 name=uprefixignored_/str
   str 

YourKit Java Profiler 2014 Released

2014-08-20 Thread YourKit
Greetings,

We are glad to announce immediate availability of YourKit Java Profiler 2014.

Download: http://www.yourkit.com/download/
Changes: http://www.yourkit.com/changes/

==
MOST NOTABLE CHANGES AND NEW FEATURES:
==

NEW FEATURE: LINE NUMBERS

  - Profiling results are presented with exact source code line numbers for
- CPU sampling;
- object allocation recording;
- stack local GC roots;
- thread stack telemetry;
- thread stacks in HPROF snapshots;
- event stacks;
- exception telemetry;
- monitor profiling.

  - IDE integration: Tools | Open in IDE (F7) action can open exact line

CPU PROFILING:

  - Improved UI responsiveness when the profiler is connected to live
profiled application

  - UI works much faster when presenting profiling results for a big number
of threads

  - Improved presentation of methods with time less than 1 millisecond

  - Java EE high-level statistics view: improvements, bug fixes

MEMORY PROFILING:

  - Memory snapshots are opened faster and require less memory to load

  - 64-bit HPROF snapshots with compressed object pointers are now detected
automatically, without user interaction

  - Android HPROF snapshots can be loaded directly, without the need to convert
them

  - The profiler format snapshot file can be converted to HPROF binary format

  - Object explorer: multiple improvements

  - Merged paths: new column Dominators

  - Merged paths: objects pending finalization are explicitly indicated

  - UI: object views performance improved

  - Action Memory | Strings by Pattern... (Ctrl+F) offers simplified and full
regex pattern syntax

  - New inspection Duplicate objects supersedes the previously existed
inspection Duplicate arrays

  - Memory inspections: other improvements

TELEMETRY:

  - Performance charts: overhead reduced

  - Daemon threads are indicated in Threads view and stack traces

  - Snapshot comparison: new feature: per-class object counts and sizes are now
compared for performance snapshots too

  - Threads view: CPU usage estimation: new option Selected thread only

PROBES:

  - New feature: Event Timeline Chart view graphically presents event 
sequence.
It is complementary to Event Timeline showing event sequence in a table
form.

  - UI: the tab renamed to Events (was Probes)

  - Event Timeline view reworked: nested events are now shown as tree nodes, and
other improvements

  - UI: objects associated with events can be opened in object explorer, if they
have not yet been collected (available for memory snapshots only)

  - Event capacity limit is now applied to the event count in each top level 
table
(e.g. File and Database) together with all events in its dependent 
tables
(e.g. Read and Query), removing result inconsistency by eliminating
partially recorded events

  - If a lasting event ends on exception, the exception detail can be stored and
presented in the UI

  - Built-in probes: multiple improvements and bug fixes

  - Reworked and simplified event model

  - Improved and streamlined support of resource-like entities

  - Changes in API

  - Probe overhead reduced

TRIGGERS:

  - Method invocation: Record Method Invocation action improved

IDE INTEGRATION:

  - Navigation action was renamed to Open in IDE in the profiler UI

  - 32-bit vs 64-bit JRE plugin setting improved

  - IntelliJ IDEA: Tools | Open in IDE (F7) action feedback improved

  - Other improvements and fixes

JAVA EE INTEGRATION:

  - Java EE integration wizard: Tomcat 8 supported

MISCELLANEOUS:

  - Agent: new synchronization mechanism significantly reduces overhead when
profiling multithreaded applications, affecting:
  - CPU tracing;
  - probes;
  - object allocation recording;
  - monitor profiling;
  - exception telemetry

  - Agent: new startup option snapshot_name_format

  - Agent: other improvements and fixes

  - Remote profiling: reduced overhead of connecting the profiler UI to a
remote profiling

  - Remote profiling: built-in support for SSH tunneling

  - UI: optimizations

  - UI: inspections UI layout has been changed to use available space more
effectively

  - UI: Mac: Dark color theme support added on Mac OS X

  - UI: Open Snapshot dialog: improved handling of already opened snapshots

  - UI: capture snapshot dialog: new snapshot file name macro {pid}

  - UI: Export: the file chooser dialog remembers the previously chosen output
format

  - UI: indication of terminated and disconnected sessions improved

Kindest regards,
YourKit Team


If you would not like to receive any more information about
YourKit Java Profiler, simply send an e-mail to i...@yourkit.com
with the subject line unsubscribe.


Re: How to restore an index from a backup over HTTP

2014-08-20 Thread Jeff Wartes

Here’s the repo:
https://github.com/whitepages/solrcloud_manager


Comments/Issues/Patches welcome.


On 8/18/14, 11:28 AM, Greg Solovyev g...@zimbra.com wrote:

Thanks Jeff, I'd be interested in taking a look at the code for this
tool. My github ID is grishick.

Thanks,
Greg

- Original Message -
From: Jeff Wartes jwar...@whitepages.com
To: solr-user@lucene.apache.org
Sent: Monday, August 18, 2014 9:49:28 PM
Subject: Re: How to restore an index from a backup over HTTP

I¹m able to do cross-solrcloud-cluster index copy using nothing more than
careful use of the ³fetchindex² replication handler command.

I¹m using this as a build/deployment tool, so I manually create a
collection in two clusters, index into one, test, and then ask the other
cluster to fetchindex from it on each shard/replica.

Some caveats:
  1. It seems like fetchindex may silently decline if it thinks the index
it has is newer.
  2. I¹m not doing this on an index that¹s currently receiving updates.
  3. SolrCloud replication doesn¹t come into this flow, even if you
fetchindex on a leader. (although once you¹re done, updates should get
replicated normally)
  4. Both collections must be created with the same number of shards and
sharding mechanism. (although replication factor can vary)
 

I¹ve got a tool for automating this that I¹d like to push to github at
some point, let me know if you¹re interested.





On 8/16/14, 3:03 AM, Greg Solovyev g...@zimbra.com wrote:

Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
straight forward, but the main concern I have is the internal data format
that ReplicationHandler and SnapPuller use. This new handler as well as
the code that I've already written to download the index files from Solr
will depend on that format. Unfortunately, this format is not documented
and is not abstracted by SolrJ, so I wonder what I can do to make sure it
does not change on us without notice.

Thanks,
Greg

- Original Message -
From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org
Sent: Friday, August 15, 2014 7:31:19 PM
Subject: Re: How to restore an index from a backup over HTTP

On 8/15/2014 5:51 AM, Greg Solovyev wrote:
 What I want to achieve is being able to send the backed up index to
Solr (either standalone or with ZooKeeper) in a way similar to creating
a new Collection. I.e. create a new collection and upload an exiting
index directly into that Collection. I've looked through Solr code and
so far I have not found a handler that would allow this scenario. So,
the last idea is to implement a special handler for this case, perhaps
extending CoreAdminHandler. ReplicationHandler together with SnapPuller
do pretty much what I need to do, except that the action has to be
initiated by the receiving Solr server and I need to initiate the action
externally. I.e., instead of having Solr slave download an index from
Solr master, I need to feed the index to Solr master and ideally this
would work the same way in standalone and SolrCloud modes.

I have not made any attempt to verify what I'm stating below.  It may
not work.

What I think I would *try* is setting up a standalone Solr (no cloud) on
the backup server.  Use scripted index/config copies and Solr start/stop
actions to get the index up and running on a known core in the
standalone Solr.  Then use the replication handler's HTTP API to
replicate the index from that standalone server to each of the replicas
in your cluster.

https://wiki.apache.org/solr/SolrReplication#HTTP_API
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexR
e
plication-HTTPAPICommandsfortheReplicationHandler

One thing that I do not know is whether SolrCloud itself might interfere
with these actions, or whether it might automatically take care of
additional replicas if you replicate to the shard leader.  If SolrCloud
*would* interfere, then this idea might need special support in
SolrCloud, perhaps as an extension to the Collections API.  If it won't
interfere, then the use-case would need to be documented (on the user
wiki at a minimum) so that committers will be aware of it and preserve
the capability in future versions.  An extension to the Collections API
might be a good idea either way -- I've seen a number of questions about
capability that falls under this basic heading.

Thanks,
Shawn



Re: Indexing and Querying MS SQL Server 2012 Spatial

2014-08-20 Thread david.w.smi...@gmail.com
Hi Alex,

I guess a spatial tutorial might be helpful, but there isn’t one.  There is
a sample at the Lucene-spatial layer but not up at Solr.  You need to use
WKT syntax for line’s and polys, and you may do so as well for other
shapes.  And in the schema use location_rpt copied from Solr’s example
schema for starters, but modified as the ref guide  wiki show to use JTS.
 The ref guide, wiki, and I would guess that book should show how to to a
bounding box query using {!bbox} — it’s pretty simple.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 19, 2014 at 11:25 AM, Bostic, Alex alex.bos...@urs.com wrote:

 Hello I'm new to Solr:
 I have a SQL Server 2012 database with spatial columns (points/lines/polys)
 Do you have any resources to point to for the following
 Creating a Solr index of a sql server spatial table
 Bounding Box query (intersect) example, possibly with front-end from GMaps
 or OpenLayers
  I'm currently reading Apache Solr Beginner's Guide and have reviewed
 https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
 I am able to index and query my non spatial data, I am just looking for
 some resource that may have some more detail about how to set everything up.
 I can provide more detail if needed.
 Thanks

 Alex Bostic
 GIS Developer
 URS Corporation
 12420 Milestone Center Drive, Suite 150
 Germantown, MD 20876
 direct line: 301-820-3287
 cell line: 301-213-2639



 This e-mail and any attachments contain URS Corporation confidential
 information that may be proprietary or privileged. If you receive this
 message in error or are not the intended recipient, you should not retain,
 distribute, disclose or use any of this information and you should destroy
 the e-mail and any attachments or copies.



Dynamically loaded core.properties file

2014-08-20 Thread Ryan Josal
Hi all, I have a question about dynamically loading a core properties 
file with the new core discovery method of defining cores.  The concept 
is that I can have a dev.properties file and a prod.properties file, and 
specify which one to load with -Dsolr.env=dev.  This way I can have one 
file which specifies a bunch of runtime properties like external servers 
a plugin might use, etc.


Previously I was able to do this in solr.xml because it can do system 
property substitution when defining which properties file to use for a core.


Now I'm not sure how to do this with core discovery, since the core is 
discovered based on this file, and now the file needs to contain things 
that are specific to that core, like name, which previously were defined 
in the xml definition.


Is there a way I can plugin some code that gets run before any schema or 
solrconfigs are parsed?  That way I could write a property loader that 
adds properties from ${solr.env}.properties to the JVM system properties.


Thanks!
Ryan


Re: Dynamically loaded core.properties file

2014-08-20 Thread Erick Erickson
Hmmm, I was going to make a code change to do this, but Chris
Hostetter saved me from the madness that ensues. Here's his comment on
the JIRA that I did open (but then closed), does this handle your
problem?

I don't think we want to make the name of core.properties be variable
... that way leads to madness and confusion.

the request on the user list was about being able to dynamically load
a property file with diff values between dev  production like you
could do in the old style solr.xml – that doesn't mean core.properties
needs to have a configurable name, it just means there needs to be a
configurable way to load properties.

we already have a properties option which can be specified in
core.properties to point to an additional external file that should
also be loaded ... if variable substitution was in play when parsing
core.properties then you could have something like
properties=custom.${env}.properties in core.properties ... but
introducing variable substitution into thecore.properties (which solr
both reads  writes based on CoreAdmin calls) brings back the host of
complexities involved when we had persistence of solr.xml as a
feature, with the questions about persisting the original values with
variables in them, vs the values after evaluating variables.

Best,
Erick

On Wed, Aug 20, 2014 at 11:36 AM, Ryan Josal ry...@pointinside.com wrote:
 Hi all, I have a question about dynamically loading a core properties file
 with the new core discovery method of defining cores.  The concept is that I
 can have a dev.properties file and a prod.properties file, and specify which
 one to load with -Dsolr.env=dev.  This way I can have one file which
 specifies a bunch of runtime properties like external servers a plugin might
 use, etc.

 Previously I was able to do this in solr.xml because it can do system
 property substitution when defining which properties file to use for a core.

 Now I'm not sure how to do this with core discovery, since the core is
 discovered based on this file, and now the file needs to contain things that
 are specific to that core, like name, which previously were defined in the
 xml definition.

 Is there a way I can plugin some code that gets run before any schema or
 solrconfigs are parsed?  That way I could write a property loader that adds
 properties from ${solr.env}.properties to the JVM system properties.

 Thanks!
 Ryan


RE: Indexing and Querying MS SQL Server 2012 Spatial

2014-08-20 Thread Pires, Guilherme
Hello,

I've been working with Solr together with JTS and use location_rpt class for 
the geometry field for a while now. (However, I must say that the index grew a 
lot when used this class instead of the geohash for simple points ..so use it 
only if you really need to index polylines and/or polygons)

I actually already successfully connected solr to postGis and oracle spatial 
via DIH but in this live website ( http://cascaismap.com ) we had a GE 
Smallworld as the GIS system so it was easier just to build a sync engine that 
periodically queries differences from the GIS and push them into solr via xml 
document. This project has already couple of years now so a lot would be 
different now.

In that website, solr provides, obviously, all the text search on the top and 
also 70% of the themes available on the treeview on the left (expand in the red 
button) that are result of a bounding box query to geometry index in solr. 
Something like this : (...)q=bounds:Intersects(-9.463118366688718 
38.67913579372146 -9.370549969166746 38.7109390712568)(...)

After this, we actually provided for a different project, a similar sync 
mechanism but between in-house solr instances and google maps engine datastore 
in the cloud and it works like a charm.

Guilherme Pires
Geospatial Intelligence @ CGI
guilherme.pi...@cgi.com


De: david.w.smi...@gmail.com [david.w.smi...@gmail.com]
Enviado: quarta-feira, 20 de Agosto de 2014 18:49
Para: solr-user@lucene.apache.org
Assunto: Re: Indexing and Querying MS SQL Server 2012 Spatial

Hi Alex,

I guess a spatial tutorial might be helpful, but there isn’t one.  There is
a sample at the Lucene-spatial layer but not up at Solr.  You need to use
WKT syntax for line’s and polys, and you may do so as well for other
shapes.  And in the schema use location_rpt copied from Solr’s example
schema for starters, but modified as the ref guide  wiki show to use JTS.
 The ref guide, wiki, and I would guess that book should show how to to a
bounding box query using {!bbox} — it’s pretty simple.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 19, 2014 at 11:25 AM, Bostic, Alex alex.bos...@urs.com wrote:

 Hello I'm new to Solr:
 I have a SQL Server 2012 database with spatial columns (points/lines/polys)
 Do you have any resources to point to for the following
 Creating a Solr index of a sql server spatial table
 Bounding Box query (intersect) example, possibly with front-end from GMaps
 or OpenLayers
  I'm currently reading Apache Solr Beginner's Guide and have reviewed
 https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
 I am able to index and query my non spatial data, I am just looking for
 some resource that may have some more detail about how to set everything up.
 I can provide more detail if needed.
 Thanks

 Alex Bostic
 GIS Developer
 URS Corporation
 12420 Milestone Center Drive, Suite 150
 Germantown, MD 20876
 direct line: 301-820-3287
 cell line: 301-213-2639



 This e-mail and any attachments contain URS Corporation confidential
 information that may be proprietary or privileged. If you receive this
 message in error or are not the intended recipient, you should not retain,
 distribute, disclose or use any of this information and you should destroy
 the e-mail and any attachments or copies.



RE: Indexing and Querying MS SQL Server 2012 Spatial

2014-08-20 Thread Bostic, Alex
Ok Great, I'm just going to dive in and see if I can index my data.  Does 
spatial reference matter?

Alex Bostic
GIS Developer
URS Corporation
12420 Milestone Center Drive, Suite 150
Germantown, MD 20876
direct line: 301-820-3287
cell line: 301-213-2639


-Original Message-
From: Pires, Guilherme [mailto:guilherme.pi...@cgi.com]
Sent: Wednesday, August 20, 2014 4:30 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing and Querying MS SQL Server 2012 Spatial

Hello,

I've been working with Solr together with JTS and use location_rpt class for 
the geometry field for a while now. (However, I must say that the index grew a 
lot when used this class instead of the geohash for simple points ..so use it 
only if you really need to index polylines and/or polygons)

I actually already successfully connected solr to postGis and oracle spatial 
via DIH but in this live website ( http://cascaismap.com ) we had a GE 
Smallworld as the GIS system so it was easier just to build a sync engine that 
periodically queries differences from the GIS and push them into solr via xml 
document. This project has already couple of years now so a lot would be 
different now.

In that website, solr provides, obviously, all the text search on the top and 
also 70% of the themes available on the treeview on the left (expand in the red 
button) that are result of a bounding box query to geometry index in solr.
Something like this : (...)q=bounds:Intersects(-9.463118366688718 
38.67913579372146 -9.370549969166746 38.7109390712568)(...)

After this, we actually provided for a different project, a similar sync 
mechanism but between in-house solr instances and google maps engine datastore 
in the cloud and it works like a charm.

Guilherme Pires
Geospatial Intelligence @ CGI
guilherme.pi...@cgi.com


De: david.w.smi...@gmail.com [david.w.smi...@gmail.com]
Enviado: quarta-feira, 20 de Agosto de 2014 18:49
Para: solr-user@lucene.apache.org
Assunto: Re: Indexing and Querying MS SQL Server 2012 Spatial

Hi Alex,

I guess a spatial tutorial might be helpful, but there isn't one.  There is a 
sample at the Lucene-spatial layer but not up at Solr.  You need to use WKT 
syntax for line's and polys, and you may do so as well for other shapes.  And 
in the schema use location_rpt copied from Solr's example schema for starters, 
but modified as the ref guide  wiki show to use JTS.
 The ref guide, wiki, and I would guess that book should show how to to a 
bounding box query using {!bbox} - it's pretty simple.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer 
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 19, 2014 at 11:25 AM, Bostic, Alex alex.bos...@urs.com wrote:

 Hello I'm new to Solr:
 I have a SQL Server 2012 database with spatial columns
 (points/lines/polys) Do you have any resources to point to for the
 following Creating a Solr index of a sql server spatial table Bounding
 Box query (intersect) example, possibly with front-end from GMaps or
 OpenLayers  I'm currently reading Apache Solr Beginner's Guide and
 have reviewed
 https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
 I am able to index and query my non spatial data, I am just looking
 for some resource that may have some more detail about how to set everything 
 up.
 I can provide more detail if needed.
 Thanks

 Alex Bostic
 GIS Developer
 URS Corporation
 12420 Milestone Center Drive, Suite 150 Germantown, MD 20876 direct
 line: 301-820-3287 cell line: 301-213-2639



 This e-mail and any attachments contain URS Corporation confidential
 information that may be proprietary or privileged. If you receive this
 message in error or are not the intended recipient, you should not
 retain, distribute, disclose or use any of this information and you
 should destroy the e-mail and any attachments or copies.



This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this message 
in error or are not the intended recipient, you should not retain, distribute, 
disclose or use any of this information and you should destroy the e-mail and 
any attachments or copies.


Re: Dynamically loaded core.properties file

2014-08-20 Thread Ryan Josal
Thanks Erick, that mirrors my thoughts exactly.  If core.properties had 
property expansion it would work for this, but I agree with not 
supporting that for the complexities it introduces, and I'm not sure 
it's the right way to solve it anyway.  So, it doesn't really handle my 
problem.


I think because the properties file I want to load is not actually 
related to any core, it makes it easier to solve.  So if solr.xml is no 
longer rewritten then it seems like a global properties file could 
safely be specified there using property expansion.  Or maybe there is 
some way to write some code that could get executed before schema and 
solrconfig are parsed, although I'm not sure how that would work given 
how you need solrconfig to load the libraries and define plugins.


Ryan

On 08/20/2014 01:07 PM, Erick Erickson wrote:

Hmmm, I was going to make a code change to do this, but Chris
Hostetter saved me from the madness that ensues. Here's his comment on
the JIRA that I did open (but then closed), does this handle your
problem?

I don't think we want to make the name of core.properties be variable
... that way leads to madness and confusion.

the request on the user list was about being able to dynamically load
a property file with diff values between dev  production like you
could do in the old style solr.xml – that doesn't mean core.properties
needs to have a configurable name, it just means there needs to be a
configurable way to load properties.

we already have a properties option which can be specified in
core.properties to point to an additional external file that should
also be loaded ... if variable substitution was in play when parsing
core.properties then you could have something like
properties=custom.${env}.properties in core.properties ... but
introducing variable substitution into thecore.properties (which solr
both reads  writes based on CoreAdmin calls) brings back the host of
complexities involved when we had persistence of solr.xml as a
feature, with the questions about persisting the original values with
variables in them, vs the values after evaluating variables.

Best,
Erick

On Wed, Aug 20, 2014 at 11:36 AM, Ryan Josal ry...@pointinside.com wrote:

Hi all, I have a question about dynamically loading a core properties file
with the new core discovery method of defining cores.  The concept is that I
can have a dev.properties file and a prod.properties file, and specify which
one to load with -Dsolr.env=dev.  This way I can have one file which
specifies a bunch of runtime properties like external servers a plugin might
use, etc.

Previously I was able to do this in solr.xml because it can do system
property substitution when defining which properties file to use for a core.

Now I'm not sure how to do this with core discovery, since the core is
discovered based on this file, and now the file needs to contain things that
are specific to that core, like name, which previously were defined in the
xml definition.

Is there a way I can plugin some code that gets run before any schema or
solrconfigs are parsed?  That way I could write a property loader that adds
properties from ${solr.env}.properties to the JVM system properties.

Thanks!
Ryan




Re: Unload collection in SolrCloud

2014-08-20 Thread didier deshommes
I added a JIRA issue here: https://issues.apache.org/jira/browse/SOLR-6399


On Thu, May 22, 2014 at 4:16 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Age out in this context is just implementing a LRU cache for open
 cores. When the cache limit is exceeded, the oldest core is closed
 automatically.

 Best,
 Erick

 On Thu, May 22, 2014 at 10:27 AM, Saumitra Srivastav
 saumitra.srivast...@gmail.com wrote:
  Eric,
 
  Can you elaborate more on what you mean by age out?
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Unload-collection-in-SolrCloud-tp4135706p4137707.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dynamically loaded core.properties file

2014-08-20 Thread Erick Erickson
OK, not quite sure if this would work, but

In each core.properties file, put in a line similar to what Chris suggested:
properties=${env}/custom.properties

You might be able to now define your sys var like
-Drelative_or_absolute_path_to_dev_custom.proerties file.
or
-Drelative_or_absolute_path_to_prod_custom.proerties file.
on Solr startup. Then in the custom.properties file you have whatever
you need to define to make the prod/dev distinction you need.

WARNING: I'm not entirely sure that relative pathing works here, which
just means I haven't tried it.

Best,
Erick

On Wed, Aug 20, 2014 at 3:11 PM, Ryan Josal ry...@pointinside.com wrote:
 Thanks Erick, that mirrors my thoughts exactly.  If core.properties had
 property expansion it would work for this, but I agree with not supporting
 that for the complexities it introduces, and I'm not sure it's the right way
 to solve it anyway.  So, it doesn't really handle my problem.

 I think because the properties file I want to load is not actually related
 to any core, it makes it easier to solve.  So if solr.xml is no longer
 rewritten then it seems like a global properties file could safely be
 specified there using property expansion.  Or maybe there is some way to
 write some code that could get executed before schema and solrconfig are
 parsed, although I'm not sure how that would work given how you need
 solrconfig to load the libraries and define plugins.

 Ryan


 On 08/20/2014 01:07 PM, Erick Erickson wrote:

 Hmmm, I was going to make a code change to do this, but Chris
 Hostetter saved me from the madness that ensues. Here's his comment on
 the JIRA that I did open (but then closed), does this handle your
 problem?

 I don't think we want to make the name of core.properties be variable
 ... that way leads to madness and confusion.

 the request on the user list was about being able to dynamically load
 a property file with diff values between dev  production like you
 could do in the old style solr.xml – that doesn't mean core.properties
 needs to have a configurable name, it just means there needs to be a
 configurable way to load properties.

 we already have a properties option which can be specified in
 core.properties to point to an additional external file that should
 also be loaded ... if variable substitution was in play when parsing
 core.properties then you could have something like
 properties=custom.${env}.properties in core.properties ... but
 introducing variable substitution into thecore.properties (which solr
 both reads  writes based on CoreAdmin calls) brings back the host of
 complexities involved when we had persistence of solr.xml as a
 feature, with the questions about persisting the original values with
 variables in them, vs the values after evaluating variables.

 Best,
 Erick

 On Wed, Aug 20, 2014 at 11:36 AM, Ryan Josal ry...@pointinside.com
 wrote:

 Hi all, I have a question about dynamically loading a core properties
 file
 with the new core discovery method of defining cores.  The concept is
 that I
 can have a dev.properties file and a prod.properties file, and specify
 which
 one to load with -Dsolr.env=dev.  This way I can have one file which
 specifies a bunch of runtime properties like external servers a plugin
 might
 use, etc.

 Previously I was able to do this in solr.xml because it can do system
 property substitution when defining which properties file to use for a
 core.

 Now I'm not sure how to do this with core discovery, since the core is
 discovered based on this file, and now the file needs to contain things
 that
 are specific to that core, like name, which previously were defined in
 the
 xml definition.

 Is there a way I can plugin some code that gets run before any schema or
 solrconfigs are parsed?  That way I could write a property loader that
 adds
 properties from ${solr.env}.properties to the JVM system properties.

 Thanks!
 Ryan




embedded documents

2014-08-20 Thread Michael Pitsounis
Hello everybody,

I had a requirement to store complicated json documents in solr.

i have modified the JsonLoader to accept complicated json documents with
arrays/objects as values.

It stores the object/array and then flatten it and  indexes the fields.

e.g  basic example document

  {
titles_json:{FR:This is the FR title , EN:This is the EN
title} ,
id: 103,
guid: 3b2f2998-85ac-4a4e-8867-beb551c0b3c6
   }

It will store titles_json:{FR:This is the FR title , EN:This is the
EN title}
and then index fields

titles.FR:This is the FR title
titles.EN:This is the EN title


Do you see any problems with this approach?



Regards,
Michael Pitsounis


Re: Dynamically loaded core.properties file

2014-08-20 Thread Umesh Prasad
The core discovery process is dependent on presence of core.properties file
in the particular directory.

You can have a script, which will traverse the directory structure of core
base directory and depending on env/host name, will either restore
core.properties or rename it to a different file.

The script will have to run before solr starts. So solr will see the
directory structures, but core.properties will be missing from directories
which you do not want to load (renamed as core.properties.bkp)

We are already using this approach to control core discovery in prod (we
have 40 plus cores and we co-host only a couple of them on a single server.
)




On 21 August 2014 04:41, Erick Erickson erickerick...@gmail.com wrote:

 OK, not quite sure if this would work, but

 In each core.properties file, put in a line similar to what Chris
 suggested:
 properties=${env}/custom.properties

 You might be able to now define your sys var like
 -Drelative_or_absolute_path_to_dev_custom.proerties file.
 or
 -Drelative_or_absolute_path_to_prod_custom.proerties file.
 on Solr startup. Then in the custom.properties file you have whatever
 you need to define to make the prod/dev distinction you need.

 WARNING: I'm not entirely sure that relative pathing works here, which
 just means I haven't tried it.

 Best,
 Erick

 On Wed, Aug 20, 2014 at 3:11 PM, Ryan Josal ry...@pointinside.com wrote:
  Thanks Erick, that mirrors my thoughts exactly.  If core.properties had
  property expansion it would work for this, but I agree with not
 supporting
  that for the complexities it introduces, and I'm not sure it's the right
 way
  to solve it anyway.  So, it doesn't really handle my problem.
 
  I think because the properties file I want to load is not actually
 related
  to any core, it makes it easier to solve.  So if solr.xml is no longer
  rewritten then it seems like a global properties file could safely be
  specified there using property expansion.  Or maybe there is some way to
  write some code that could get executed before schema and solrconfig are
  parsed, although I'm not sure how that would work given how you need
  solrconfig to load the libraries and define plugins.
 
  Ryan
 
 
  On 08/20/2014 01:07 PM, Erick Erickson wrote:
 
  Hmmm, I was going to make a code change to do this, but Chris
  Hostetter saved me from the madness that ensues. Here's his comment on
  the JIRA that I did open (but then closed), does this handle your
  problem?
 
  I don't think we want to make the name of core.properties be variable
  ... that way leads to madness and confusion.
 
  the request on the user list was about being able to dynamically load
  a property file with diff values between dev  production like you
  could do in the old style solr.xml – that doesn't mean core.properties
  needs to have a configurable name, it just means there needs to be a
  configurable way to load properties.
 
  we already have a properties option which can be specified in
  core.properties to point to an additional external file that should
  also be loaded ... if variable substitution was in play when parsing
  core.properties then you could have something like
  properties=custom.${env}.properties in core.properties ... but
  introducing variable substitution into thecore.properties (which solr
  both reads  writes based on CoreAdmin calls) brings back the host of
  complexities involved when we had persistence of solr.xml as a
  feature, with the questions about persisting the original values with
  variables in them, vs the values after evaluating variables.
 
  Best,
  Erick
 
  On Wed, Aug 20, 2014 at 11:36 AM, Ryan Josal ry...@pointinside.com
  wrote:
 
  Hi all, I have a question about dynamically loading a core properties
  file
  with the new core discovery method of defining cores.  The concept is
  that I
  can have a dev.properties file and a prod.properties file, and specify
  which
  one to load with -Dsolr.env=dev.  This way I can have one file which
  specifies a bunch of runtime properties like external servers a plugin
  might
  use, etc.
 
  Previously I was able to do this in solr.xml because it can do system
  property substitution when defining which properties file to use for a
  core.
 
  Now I'm not sure how to do this with core discovery, since the core is
  discovered based on this file, and now the file needs to contain things
  that
  are specific to that core, like name, which previously were defined in
  the
  xml definition.
 
  Is there a way I can plugin some code that gets run before any schema
 or
  solrconfigs are parsed?  That way I could write a property loader that
  adds
  properties from ${solr.env}.properties to the JVM system properties.
 
  Thanks!
  Ryan
 
 




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: logging in solr

2014-08-20 Thread Umesh Prasad
Or you could use system properties to control that.

For example if you are using logbak, then

JAVA_OPTS=$JAVA_OPTS
-Dlogback.configurationFile=$CATALINA_BASE/conf/logback.xml will do it




On 20 August 2014 03:15, Aman Tandon amantandon...@gmail.com wrote:

 As you are using tomcat you can configure the log file name, folder,etc. by
 configuring the server.xml present in the Conf directory of tomcat.
 On Aug 19, 2014 4:17 AM, Shawn Heisey s...@elyograg.org wrote:

  On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote:
   Currently in my component Solr is logging to catalina.out. What
  is the configuration needed to redirect those logs to some custom logfile
  eg: Solr.log.
 
  Solr uses the slf4j library for logging.  Simply change your program to
  use slf4j, and very likely the logs will go to the same place the Solr
  logs do.
 
  http://www.slf4j.org/manual.html
 
  See also the wiki page on logging jars and Solr:
 
  http://wiki.apache.org/solr/SolrLogging
 
  Thanks,
  Shawn
 
 




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Substring and Case In sensitive Search

2014-08-20 Thread Umesh Prasad
The performance of wild card queries and specially prefix wild card query
can be quite slow.

http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/WildcardQuery.html

Also, you won't be able to time them out.

Take a look at ReversedWildcardFilter

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html

The blog post describes it nicely ..

http://solr.pl/en/2011/10/10/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-solr-reversedwildcardfilter-%E2%80%93-lets-optimize-wildcard-queries-part-8/



On 19 August 2014 22:19, Jack Krupansky j...@basetechnology.com wrote:

 Substring search a string field using wildcard, *, at beginning and end
 of query term.

 Case-insensitive match on string field is not supported.

 Instead, copy the string field to a text field, use the keyword tokenizer,
 and then apply the lower case filter.

 But... review your use case to confirm whether you really need to use
 string as opposed to text field.

 -- Jack Krupansky

 -Original Message- From: Nishanth S
 Sent: Tuesday, August 19, 2014 12:03 PM
 To: solr-user@lucene.apache.org
 Subject: Substring and Case In sensitive Search


 Hi,

 I am  very new to solr.How can I allow solr search on a string field case
 insensitive and substring?.

 Thanks,
 Nishanth




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Grouping based on multiple filters/criterias

2014-08-20 Thread Umesh Prasad
Grouping supports group by queries.

https://cwiki.apache.org/confluence/display/solr/Result+Grouping

However you will need to form the group queries before hand.







On 18 August 2014 12:47, deniz denizdurmu...@gmail.com wrote:

 is it possible to have multiple filters/criterias on grouping? I am trying
 to
 do something like those, and I am assuming that from the statuses of the
 tickets, it doesnt seem possible?

 https://issues.apache.org/jira/browse/SOLR-2553
 https://issues.apache.org/jira/browse/SOLR-2526
 https://issues.apache.org/jira/browse/LUCENE-3257

 To make everything clear, here is details which I am planning to do with
 Solr...

 so there is an activity feed of a site and it is basically working like
 facebook or linkedin newsfeed, though there is no relationship between
 users, it doesnt matter if i am following someone or not, as long as their
 settings allows me to see their posts and they hit my search filter, i will
 see their posts.

 the part related with grouping is tricky... so lets assume that you are
 able
 to see my posts, and I have posted 8 activities in the last one hour, those
 activities should appear different than other posts, as it would be a
 combined view of the posts...

 i.e
  deniz
   activity one
   activity two
   .
   activity eight
  /deniz
  other user 1
  single activity
  /other user 1
  another user 1
  single activity
   /another user 1
   other user 2
  activity one
  activity two
   /other user 2

 So here the results should be grouped depending on their post times...

 on solr (4.7.2), i am indexing activities as documents, and each document
 has bunch of fields including timestamp and source_user etc etc.

 is it possible to do this on current solr?

 (in case the details are not clear, please feel free to ask for more
 details
 :) )







 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Selectively setting the number of returned SOLR rows per field based on field value

2014-08-20 Thread Umesh Prasad
Field Collapsing has a limitation. Currently it will not allow you to get
different number of results from a each group.

You can plug a custom AnalyticQuery, which can do exactly what you want
with after seeing a matching document.
https://cwiki.apache.org/confluence/display/solr/AnalyticsQuery+API




On 18 August 2014 04:32, Erick Erickson erickerick...@gmail.com wrote:

 Aurélien is correct, for the exact behavior you're looking
 for you'd need to run w queries.

 But you might be able to make do with field collapsing.
 You'd probably have to copyField from title to
 title_grouping which would be un-analyzed (string type
 or KeywordTokenizer), then group on _that_ field.
 You'd get back the top N matches grouped by title and
 your app could display that info however it made sense.

 Grouping sometimes goes by field collapsing FWIW.
 Erick

 On Sun, Aug 17, 2014 at 2:16 PM, talt mikaelsaltz...@gmail.com wrote:
  I have a field in my SOLR index, let's call it book_title.
 
  A query returns 15 rows with book_title:The Kite Runner, 13 rows with
  book_title:The Stranger, and 8 rows with book_title:The Ruby Way.
 
  Is there a way to return only the first row of The Kite Runner and The
  Stranger, but all of the The Ruby Way rows from the previous query
  result? This would result in 10 rows altogether. Is this possible at all,
  using a single query?
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Selectively-setting-the-number-of-returned-SOLR-rows-per-field-based-on-field-value-tp4153441.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/