date:20140925

Re: SlrCloud RAM requirments

2014-09-25 Thread Shawn Heisey

On 9/24/2014 2:18 AM, Toke Eskildsen wrote:
 Norgorn [lsunnyd...@mail.ru] wrote:
 I have CLOUD with 3 nodes and 16 MB RAM on each.
 My index is about 1 TB and search speed is awfully bad.
 
 We all have different standard with regards to search performance. What is 
 awfully bad and what is good enough for you?
 
 Related to this: How many documents are in your index, how do you query 
 (faceting, sorting, special searches) and how often is an index performed?
 
 I've read, that one needs at least 50% of index size in RAM,
 
 That is the common advice, yes. The advice is not bad for some use cases. The 
 problem is that it has become gospel.
 
 I am guessing that you are using spinning drives? Solr needs fast random 
 access reads and spinning drives are very slow for that. You can either 
 compensate by buying enough RAM or you can change to a faster underlying 
 storage technology. The obvious choice these days are Solid State Drives (we 
 bought Samsung 840 EVO's last time and would probably buy those again). They 
 will not give you RAM speed, but they do give a lot more bang for the buck 
 and depending on your performance requirements they can be enough.

I am guilty of spreading the gospel that you need 50-100% of your
index to fit in the OS disk cache, as Toke mentioned.  This wiki page is
my creation:

http://wiki.apache.org/solr/SolrPerformanceProblems

I've seen decent performance out of systems with standard hard disks
that only had enough RAM to fit about 25% of the index into the disk
cache, but I've also seen systems with 50% that can't complete a simple
query in less than 10 seconds.

With a terabyte of index on the system (assuming that's how much is on
each one), 25% is still at least 256GB of RAM.  With only 16GB, there's
simply no way you'll ever get good performance.

I've heard quite a lot of anecdotal evidence that if you put the index
on SSD, you only need 10% of the index to fit in RAM.  I'm a little bit
skeptical that this would be true as a general rule, but I do not doubt
that it's been done successfully.  For a terabyte index, that's still
100GB of RAM, so 128GB would be the absolute minimum that you'll want to
consider.  The more RAM you can throw at this problem, the better your
performance will be.

Thanks,
Shawn

Using SolrCloud on Amazon EC2

2014-09-25 Thread Timo Schmidt

Hi together,

we currently plan to setup a project based on solr cloud and amazon 
webservices. Our main search application is deployed using aws opsworks which 
works out quite good.
Since we also want to provision solr to ec2 i want to ask for experiences with 
the different deployment/provisioning tools.
By now i see the following 3 approaches.

1. Using ludic solr scale tk to setup and maintain the cluster
Who is using this in production and what are your experiences?

2. Implementing own chef cookbooks for aws opsworks to install solrcloud as a 
custom opsworks layer
Did somebody do this allready?
What are you experiences?

Are there any cookbooks out, where we can contribute and reuse?

3. Implementing own chef cookbooks for aws opsworks to install solrcloud as a 
docker container
Any experiences with this?

Do you see other options? Afaik elasticbeanstalk could also be an option?
It would be very nice to get some experiences and recommendations?

Cheers

Timo

Help needed in Indexing and Search on xml content

2014-09-25 Thread sangeetha.subraman...@gtnexus.com

Hi Team,

I am a newbie to SOLR. I have got search fields stored in a xml file which is 
stored in MSSQL. I want to index on the content of the xml file in SOLR. We 
need to provide search based on the fields present in the XML file.

The reason why we are storing the input details as XML file is , the users will 
be able to add custom input fields on their own with values. Storing these 
custom fields as columns in MSSQL seems to be not an optimal solution. So we 
thought of putting it in XML file and store that file in RDBMS.
But I am not sure on how we can index the content of the file to make search 
better. I believe this can be done by ExtractingRequestHandler.

Could someone help me on how we can implement this/ direct me to some pages 
which could be of help to me ?

Thanks
Sangeetha

(auto)suggestions, but ony from a filtered set of documents

2014-09-25 Thread Clemens Wyss DEV

What I'd like to do is
http://localhost:8983/solr/solrpedia/suggest?q=atmqf=source:mysource

Through qf (or however the parameter shall be called) I'd like to restrict the 
suggestions to documents which fit the given qf-query. 
I need this filter if (as posted in a previous thread) I intend to put 
different kind of data into one core/collection, cause suggestion shall be 
restrictable to one or many source(s)

Re: SlrCloud RAM requirments

2014-09-25 Thread Toke Eskildsen

On Thu, 2014-09-25 at 06:29 +0200, Norgorn wrote:
 I can't say for sure, cause filter caches are out of the JVM (dat HS), but
 top shows  5 GB cached and no free RAM.

The cached reported from top should be correct, no matter if one used
off-heap or not: You have 5GB for (I guess) 300MB index, so 1.5% of the
index size.

I agree fully with Shawn that this will never perform for interactive
use, when you're using spinning drives.

 The only question for me now is how to balance disk cache and filter cache?
 Do I need to worry about that, or big disk cache is enough?

Even if you skipped the filters fully (so just simple queries) and
magically had 15GB out of the 16GB free for disk cache, it would only be
5% of the index size. Still not enough for decent performance with
spinning drives, unless your index is very special, e.g. enormous amount
of stored fields.


As for the whole how much will it help with SSDs?, might I suggest
simply testing? Buy a 500GB SSD and put it in one of the machines, test
searches against that shard vs. the shards on the other machines. If you
do not see much difference, move the drive to your developer machine and
be happy for the upgrade. Win-win.

 And does optimized index  mean SOLR optimize command, or something else?

Optimized down to a single segment (which I think the 'optimize' command
will do). But you should only consider that if you know that your shard
will not be updated in the foreseeable future.

- Toke Eskildsen, State and University Library, Denmark

Re: traversing Automaton in lucene 4.10

2014-09-25 Thread Dmitry Kan

case solved, example of traversal found in lucene's source code (pointed to
by Mike McCandless):

https://github.com/apache/lucene-solr/blob/2836bd99101026860b12233a87e35101769a538f/lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java#L535



On Fri, Sep 19, 2014 at 5:27 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi,

 o.a.l.u.automaton.Automaton api has changed in lucene 4.10 (
 https://issues.apache.org/jira/secure/attachment/12651171/LUCENE-5752.patch
 ).

 Method getNumberedStates() got dropped
 class State does not exist anymore.

 How do I traverse an Automaton with the new api?

 Dmitry

 --
 Dmitry Kan
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info




-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

/suggest through SolrJ?

2014-09-25 Thread Clemens Wyss DEV

Am I right that I cannot call /suggest (i.e. the corresponding RequestHandler) 
through SolrJ?

What is the preferreded way to call Solr handlers/operations not supported by 
SolrJ from Java? Through new SolrJ Request-classes?

Turn off suggester

2014-09-25 Thread PeriS

Is there a way to turn off the solr suggester? I have about 30M records and 
when tomcat starts up, it takes a long time (~10 minutes) for the suggester to 
decompress the data or its doing soothing as it hangs on SolrSuggester.build(); 
Any ideas please?

Thanks
-Peri



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.

SolrCloud Slow to boot up

2014-09-25 Thread anand.mahajan

Hello all,

Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each - 108 cores
across 6 servers. Moved in about 250M documents in this cluster. When I
restart this cluster - only the leaders per shard comes up live instantly
(within a minute) and all the replicas are shown as Recovering on the Cloud
screen and all 6 servers are doing some processing (consuming about 4 CPUs
at the back and doing a lot of Network IO too) In essence its not doing any
reads are writes to the index and I dont see any replication/catch up
activity going on too at the back, yet the RAM grows consuming all 96GB
available on each box. And all the Recovering replicas recover one by one in
about an hour or so. Why is it taking so long to boot up, and what is it
doing that is consuming so much CPU, RAM and Network IO? All disks are
reading at 100% on all servers during this boot up. Is there are setting I
might have missed that will help?  

FYI - The Zookeeper cluster is on the same 6 boxes.  Size of the Solr data
dir is about 150GB per server and each box has 96GB RAM.

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scoring with wild cars

2014-09-25 Thread Jack Krupansky

The wildcard query is “constant score” to make it faster, so unfortunately that 
means there is no score differentiation between the wildcard matches.

You can simple add the wildcard prefix as a separate query term and boost it:

q=text:carre* text:carre^1.5

-- Jack Krupansky

From: Pigeyre Romain 
Sent: Wednesday, September 24, 2014 2:12 PM
To: solr-user@lucene.apache.org 
Cc: Pigeyre Romain 
Subject: Scoring with wild cars

Hi,

 

I hava two records with name_fra field

One with name_fra=”un test CARREAU”

And another one with name_fra=”un test CARRE”

 

{

codeBarre: 1,

name_FRA: un test CARREAU

  }

{

codeBarre: 2,

name_FRA: un test CARRE

  }

 

Configuration of these fields are :

 

field name=name_FRA type=text_general indexed=true stored=true 
required=false multiValued=false /

field name=codeBarre type=string indexed=true stored=true 
required=true multiValued=false /

field name=text type=text_general indexed=true stored=false 
multiValued=true /

copyField source=name_FRA dest=text/

 

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100

  analyzer type=index

tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /

!-- in this example, we will only use synonyms at query time

filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/

--

filter class=solr.LowerCaseFilterFactory/

filter class=solr.ASCIIFoldingFilterFactory/

  /analyzer

  analyzer type=query

tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.ASCIIFoldingFilterFactory/

  /analyzer

/fieldType

 

When I’m using this query :

http://localhost:8983/solr/cdv_product/select?q=text%3Acarre*fl=score%2C+*wt=jsonindent=truedebugQuery=true

The result is :

{

  responseHeader:{

status:0,

QTime:2,

params:{

  debugQuery:true,

  fl:score, *,

  indent:true,

  q:text:carre*,

  wt:json}},

  response:{numFound:2,start:0,maxScore:1.0,docs:[

  {

   codeBarre:1,

name_FRA:un test CARREAU,

_version_:1480150860842401792,

score:1.0},

  {

codeBarre:2,

name_FRA:un test CARRE,

_version_:1480150875738472448,

score:1.0}]

  },

  debug:{

rawquerystring:text:carre*,

querystring:text:carre*,

parsedquery:text:carre*,

parsedquery_toString:text:carre*,

explain:{

  1:\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n  1.0 = 
boost\n  1.0 = queryNorm\n,

  2:\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n  1.0 = 
boost\n  1.0 = queryNorm\n},

QParser:LuceneQParser,

timing:{

  time:2.0,

  prepare:{

time:1.0,

query:{

  time:1.0},

facet:{

  time:0.0},

mlt:{

  time:0.0},

highlight:{

  time:0.0},

stats:{

  time:0.0},

expand:{

  time:0.0},

debug:{

  time:0.0}},

  process:{

time:1.0,

query:{

  time:0.0},

facet:{

  time:0.0},

mlt:{

  time:0.0},

highlight:{

  time:0.0},

stats:{

  time:0.0},

expand:{

  time:0.0},

debug:{

  time:1.0}

 

The score is the same for both of record. CARREAU record is first and CARRE is 
next. I want to place CARRE before CARREAU result because CARRE is an exact 
match. Is it possible?

 

NB : scoring for this query only use querynorm and boosters

 

In this test :

http://localhost:8983/solr/cdv_product/select?q=text%3Acarrefl=score%2C*wt=jsonindent=truedebugQuery=true

 

I have only one record found but the scoring is more complex. Why?

{  responseHeader:{status:0,QTime:2,params:{  
debugQuery:true,  fl:score,*,  indent:true,  
q:text:carre,  wt:json}},  
response:{numFound:1,start:0,maxScore:0.53033006,docs:[  {
codeBarre:2,name_FRA:un test CARRE,
_version_:1480150875738472448,score:0.53033006}]  },  debug:{
rawquerystring:text:carre,querystring:text:carre,
parsedquery:text:carre,parsedquery_toString:text:carre,
explain:{  2:\n0.53033006 = (MATCH) weight(text:carre in 0) 
[DefaultSimilarity], result of:\n  0.53033006 = fieldWeight in 0, product of:\n 
   1.4142135 = tf(freq=2.0), with freq of:\n  2.0 = termFreq=2.0\n1.0 = 
idf(docFreq=1, maxDocs=2)\n0.375 = fieldNorm(doc=0)\n},
QParser:LuceneQParser,timing:{  time:2.0,  prepare:{

Re: Help needed in Indexing and Search on xml content

2014-09-25 Thread PeriS

Hi Sangeetha,

If you can tell me a little bit more about your setup, I can try and help. If 
you are on skype, that would be the easiest. 

Thanks
-Peri

On Sep 25, 2014, at 3:50 AM, sangeetha.subraman...@gtnexus.com wrote:

 Hi Team,
 
 I am a newbie to SOLR. I have got search fields stored in a xml file which is 
 stored in MSSQL. I want to index on the content of the xml file in SOLR. We 
 need to provide search based on the fields present in the XML file.
 
 The reason why we are storing the input details as XML file is , the users 
 will be able to add custom input fields on their own with values. Storing 
 these custom fields as columns in MSSQL seems to be not an optimal solution. 
 So we thought of putting it in XML file and store that file in RDBMS.
 But I am not sure on how we can index the content of the file to make search 
 better. I believe this can be done by ExtractingRequestHandler.
 
 Could someone help me on how we can implement this/ direct me to some pages 
 which could be of help to me ?
 
 Thanks
 Sangeetha
 
 
 --- 
 This message has been scanned for viruses and dangerous content by HTC E-Mail 
 Virus Protection Service. 
 




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.

Re: Changed behavior in solr 4 ??

2014-09-25 Thread Jack Krupansky

I am not aware of any such feature! That doesn't mean it doesn't exist, but 
I don't recall seeing it in the Solr source code.


-- Jack Krupansky

-Original Message- 
From: Jorge Luis Betancourt Gonzalez

Sent: Wednesday, September 24, 2014 1:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Changed behavior in solr 4 ??

Hi Jack:

Thanks for the response, yes the way you describe I know it works and is how 
I get it to work but then what does mean the snippet of the documentation I 
see on the documentation about overriding the default components shipped 
with Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw 
something similar to what I wanted to do:


searchComponent name=query class=solr.QueryComponent
 lst name=invariants
   str name=rows25/str
   str name=dfcontent_field/str
 /lst
 lst name=defaults
   str name=q*:*/str
   str name=indenttrue/str
   str name=echoParamsexplicit/str
 /lst
/searchComponent
Because each default search component exists by default even if it’s not 
defined explicitly in the solrconfig.xml file, defining them explicitly as 
in the previous listing will replace the default configuration.


The previous snippet is from the quoted book Solr in Action, I understand 
that in each SearchHandler I could define this parameters bu if defined in 
the searchComponent (as the book says) this configuration wouldn’t apply to 
all my request handlers? eliminating the need to replicate the same 
parameter in several parts of my solrconfig.xml (i.e all the request 
handlers)?



Regards,
On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com 
wrote:



You set the defaults on the search handler, not the search component. 
See solrconfig.xml:


requestHandler name=/select class=solr.SearchHandler
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dftext/str
 /lst
...

-- Jack Krupansky

-Original Message- From: Jorge Luis Betancourt Gonzalez
Sent: Tuesday, September 23, 2014 11:02 AM
To: solr-user@lucene.apache.org
Subject: Changed behavior in solr 4 ??

Hi:

I’m trying to change the default configuration for the query component of 
a SearchHandler, basically I want to set a default value to the rows 
parameters and that this value be shared by all my SearchHandlers, as 
stated on the solrconfig.xml comments, this could be accomplished 
redeclaring the query search component, however this is not working on 
solr 4.9.0 which is the version I’m using, this is my configuration:


  searchComponent name=query class=solr.QueryComponent
  lst name=defaults
  int name=rows1/int
  /lst
  /searchComponent

The relevant portion of the solrconfig.xml comment is: If you register a 
searchComponent to one of the standard names,  will be used instead of the 
default.” so is this a new desired behavior?? although just for testing a 
redefined the components of the request handler to only use the query 
component and not to use all the default components, this is how it looks:


requestHandler name=/select class=solr.SearchHandler”
arr name=components
  strquery/str
/arr
/requestHandler

Everything works ok but the the rows parameter is not used, although I’m 
not specifying the rows parameter on the URL.


Regards,Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com



Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com

point buffer returned as an elipse, how to configure?

2014-09-25 Thread Mark G

Solr team, I am indexing geographic points in dec degrees lat lon using the
location_rpt type in my index. The type is setup like this

 fieldType name=location_rpt
class=solr.SpatialRecursivePrefixTreeFieldType
geo=true distErrPct=0.025 maxDistErr=0.09 units=degrees
 /

my field definition is this

field name=pointGeom_rpt type=location_rpt indexed=true
stored=true  multiValued=false/

my problem is the return is a very narrow but tall ellipse likely due
to the degrees and geo  true... but when I change those params to
geo=false...the index won't start
this is the query I am using

 String query =
http://myserver:8983/solr/mycore/select?q=*:*fq={!geofilt}sfield=pointGeom_rptpt=;
+ lat + , + lon + d= + distance +
wt=jsonindent=truegeo=truerows= + rows;



I am not using solr cloud, and I am on version 4.8.0

I also opened up this stackoverflow question... it has some more details
and a picture of the return I get

http://stackoverflow.com/questions/25996820/why-is-solr-spatial-buffer-returned-as-an-elipse


BTW, I'm an OpenNLP committer and I am very geospatially focused, let me
know if you want help with anything geo, I'll try to carve out some time if
needed.

thanks
G$

Solr stops in between indexing

2014-09-25 Thread madhav bahuguna

Hi,
I have solr configured on google cloud server.
When ever i try to index it ,it stops in between and shows and error
connection lost,connection timeout.
I have 2200 records some time it stops full indexing at 917 sometime 1385
sometime
2185.
I have apache2 running on google cloud on debian OS.
Earlier it was working fine,it has started giving this error recently only
Please advise and help.

-- 
Regards
Madhav Bahuguna

Re: Help in selecting the appropriate feature to obtain results

2014-09-25 Thread Mikhail Khludnev

I call it 'reverse search' problem (regex indexing). It's almost
impossible. You can
- do it your own
http://blog.mikemccandless.com/2013/06/build-your-own-finite-state-transducer.html
- create
http://lucene.apache.org/core/4_1_0/memory/org/apache/lucene/index/memory/MemoryIndex.html
from the incoming string, and search by those stored queries with regexps.
eg check https://www.youtube.com/watch?v=rmRCsrJp2A8
- more realistically you can index separate letters from patterns, search
for any of incoming letters, and postfilter the result, which are found.


On Wed, Sep 24, 2014 at 7:04 PM, barrybear rotibo...@gmail.com wrote:

 Hi guys, I'm still a beginner to Solr and I'm not sure whether to
 implement a
 custom Filter Query or any other available features/plugins that I am not
 aware of in Solr. I am using Solr v4.4.0.

 I have a collection as an example as below:

 [
{
   description: 'group1',
   group: ['G?', 'GE*']
},
{
   description: 'group2',
   group: ['GEB']
},
{
   description: 'group3',
   group: ['G']
}
 ]

 Where group field is a multiValued whereby will contain of alphabets which
 will determine the ranking and  two special characters: ? and *. Placing a
 ?
 at the back will mean any subordinate of that ranking, while * means all
 levels of subordinates of that particular ranking.

 If I were to search for group:'GEB', I will expect to obtain result:
 [
{
   description: 'group1',
   group: ['G?', 'GE*']
},
{
   description: 'group2',
   group: ['GEB']
}
 ]

 While searching for group:'GE', should return this result:
 [
{
   description: 'group1',
   group: ['G?', 'GE*']
}
 ]

 And finally searching for group:'G' should only return one result:
 [
{
   description: 'group3',
   group: ['G']
}
 ]

 Hope that my explanation is clear enough and thanks for your attention and
 time..



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Help-in-selecting-the-appropriate-feature-to-obtain-results-tp4160944.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Setting of Default Boost in Edismax Search Handler

2014-09-25 Thread O. Olson

I have a setup very similar to the /browse handler in the example
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/solrconfig.xml?view=markup)
  

I am curious if it is possible to set a default boost function (e.g.
bf=log(qty)) , so that all query results would reflect it.

Thank you,
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-of-Default-Boost-in-Edismax-Search-Handler-tp4161122.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MRIT's morphline mapper doesn't co-locate with data

2014-09-25 Thread Tom Chen

Do you have the solr Jira number for the new ingestion tool?

Thanks

On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek whosc...@cloudera.com
wrote:

 Based on our measurements, Lucene indexing is so CPU intensive that it
 wouldn’t really help much to exploit data locality on read. The
 overwhelming bottleneck remains the same. Having said that, we have an
 ingestion tool in the works that will take advantage of data locality for
 splitable files as well.

 Wolfgang.

 On Sep 24, 2014, at 9:38 AM, Tom Chen tomchen1...@gmail.com wrote:

  Hi,
 
  The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
  mapper. The mapper doesn't co-locate with the input data that it process.
  Isn't this a performance hit?
 
  Ideally, morphline mapper should be run on those hosts that contain most
  data blocks for the input files it process.
 
  Regards,
  Tom

Solr and hadoop

2014-09-25 Thread Tom Chen

I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat
and EsOutputFormat that are provided by Elasticserach for Hadoop
(es-hadoop).

Is it possible for Solr to provide such integration with Hadoop?

Best,
Tom

Re: Changed behavior in solr 4 ??

2014-09-25 Thread Jorge Luis Betancourt Gonzalez

I haven’t used it before this, basically I found out about this in the Solr in 
Action book and guided by the comment about redefining the default components 
by defining a new searchComponent with the same name. 

Any how thanks for your reply! 

Regards,

On Sep 25, 2014, at 8:01 AM, Jack Krupansky j...@basetechnology.com wrote:

 I am not aware of any such feature! That doesn't mean it doesn't exist, but I 
 don't recall seeing it in the Solr source code.
 
 -- Jack Krupansky
 
 -Original Message- From: Jorge Luis Betancourt Gonzalez
 Sent: Wednesday, September 24, 2014 1:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Changed behavior in solr 4 ??
 
 Hi Jack:
 
 Thanks for the response, yes the way you describe I know it works and is how 
 I get it to work but then what does mean the snippet of the documentation I 
 see on the documentation about overriding the default components shipped with 
 Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw 
 something similar to what I wanted to do:
 
 searchComponent name=query class=solr.QueryComponent
 lst name=invariants
   str name=rows25/str
   str name=dfcontent_field/str
 /lst
 lst name=defaults
   str name=q*:*/str
   str name=indenttrue/str
   str name=echoParamsexplicit/str
 /lst
 /searchComponent
 Because each default search component exists by default even if it’s not 
 defined explicitly in the solrconfig.xml file, defining them explicitly as in 
 the previous listing will replace the default configuration.
 
 The previous snippet is from the quoted book Solr in Action, I understand 
 that in each SearchHandler I could define this parameters bu if defined in 
 the searchComponent (as the book says) this configuration wouldn’t apply to 
 all my request handlers? eliminating the need to replicate the same parameter 
 in several parts of my solrconfig.xml (i.e all the request handlers)?
 
 
 Regards,
 On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com wrote:
 
 
 You set the defaults on the search handler, not the search component. 
 See solrconfig.xml:
 
 requestHandler name=/select class=solr.SearchHandler
 !-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dftext/str
 /lst
 ...
 
 -- Jack Krupansky
 
 -Original Message- From: Jorge Luis Betancourt Gonzalez
 Sent: Tuesday, September 23, 2014 11:02 AM
 To: solr-user@lucene.apache.org
 Subject: Changed behavior in solr 4 ??
 
 Hi:
 
 I’m trying to change the default configuration for the query component of a 
 SearchHandler, basically I want to set a default value to the rows 
 parameters and that this value be shared by all my SearchHandlers, as stated 
 on the solrconfig.xml comments, this could be accomplished redeclaring the 
 query search component, however this is not working on solr 4.9.0 which is 
 the version I’m using, this is my configuration:
 
  searchComponent name=query class=solr.QueryComponent
  lst name=defaults
  int name=rows1/int
  /lst
  /searchComponent
 
 The relevant portion of the solrconfig.xml comment is: If you register a 
 searchComponent to one of the standard names,  will be used instead of the 
 default.” so is this a new desired behavior?? although just for testing a 
 redefined the components of the request handler to only use the query 
 component and not to use all the default components, this is how it looks:
 
 requestHandler name=/select class=solr.SearchHandler”
 arr name=components
  strquery/str
 /arr
 /requestHandler
 
 Everything works ok but the the rows parameter is not used, although I’m not 
 specifying the rows parameter on the URL.
 
 Regards,Concurso Mi selfie por los 5. Detalles en 
 http://justiciaparaloscinco.wordpress.com
 
 
 Concurso Mi selfie por los 5. Detalles en 
 http://justiciaparaloscinco.wordpress.com
 

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com

Re: Solr and hadoop

2014-09-25 Thread Michael Della Bitta

Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the
Morphline stuff (check out
https://github.com/markrmiller/solr-map-reduce-example).

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/

On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen tomchen1...@gmail.com wrote:

 I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat
 and EsOutputFormat that are provided by Elasticserach for Hadoop
 (es-hadoop).

 Is it possible for Solr to provide such integration with Hadoop?

 Best,
 Tom

Re: Solr Cloud Default Document Routing

2014-09-25 Thread Erick Erickson

Well, you've picked the absolute worst case for comparison. The
increase to double digits is a constant overhead. IOW, let's
say your query went from 5ms to 20 ms. That 15 ms is pretty much
the additional overhead no matter what the query. This particular
query just happens to be very fast in the first place.

As far as queries going out to all the shards.. Well, they have to.
The query processing cannot know ahead of time (except in this
_very_ special case) what shards will generate hits. So the request
is sent out to one replica in each shard, which responds with its
top N. The originating node then combines the sub-queries to get
the IDs of the final top N, then sends a request out to each shard
hosting one of those top N for the data associated with the
document.

If you really need super-efficiency here, you could probably
look at SolrCloudServer to get an idea of how to translate from
ID to shard and just do direct requests with distrib=false.

Best,
Erick


On Wed, Sep 24, 2014 at 5:44 PM, Susmit Shukla shukla.sus...@gmail.com
wrote:

 Hi,

 I'm building out a multi shard solr collection as the index size is likely
 to grow fast.
 I was testing out the setup with 2 shards on 2 nodes with test data.
 Indexed few documents with id as the unique key.
 collection create command -
 /solr/admin/collections?action=CREATEname=multishardnumShards=2

 used this command to upload - curl
 http://server/solr/multishard/update/json?commitWithin=2000 --data-binary
 @data.json -H 'Content-type:application/json'

 data.json -
 [
   {
 id: 100161200
   }
   {
 id: 100161384
   }
 ]

 when I query on one of the node with with an id constraint, I see the query
 executed on both shards which looks inefficient - Qtime increased to double
 digits. I guess solr would know based on id which shard data went to.

 I have a few questions around this as I could not find pertinent
 information on user lists or documentation.
 - query is hitting all shards and replicas - if I have 3 shards and 5
 replicas , how would the performance be impacted since for the very simple
 case it increased to double digits?
 - Could id lookup queries just go to one shard automatically?


 /solr/multishard/select?q=id%3A100161200wt=jsonindent=truedebugQuery=true

 QTime:13,

   debug:{
 track:{
   rid:-multishard_shard1_replica1-1411605234897-171,
   EXECUTE_QUERY:[
 http://server1/solr/multishard_shard1_replica1/;,[
   QTime,1,
   ElapsedTime,4,
   RequestPurpose,GET_TOP_IDS,
   NumFound,1,
   Response,some resp],
 http://server2/solr/multishard_shard2_replica1/;,[
   QTime,1,
   ElapsedTime,6,
   RequestPurpose,GET_TOP_IDS,
   NumFound,0,
   Response,some]],
   GET_FIELDS:[
 http://server1/solr/multishard_shard1_replica1/;,[
   QTime,0,
   ElapsedTime,4,
   RequestPurpose,GET_FIELDS,GET_DEBUG,
   NumFound,1,


 Thanks,
 Susmit

Re: SolrCloud Slow to boot up

2014-09-25 Thread Michael Della Bitta

1. What version of Solr are you running?
2. Have you made substantial changes to solrconfig.xml?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/

On Thu, Sep 25, 2014 at 7:19 AM, anand.mahajan an...@zerebral.co.in wrote:

 Hello all,

 Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each - 108 cores
 across 6 servers. Moved in about 250M documents in this cluster. When I
 restart this cluster - only the leaders per shard comes up live instantly
 (within a minute) and all the replicas are shown as Recovering on the Cloud
 screen and all 6 servers are doing some processing (consuming about 4 CPUs
 at the back and doing a lot of Network IO too) In essence its not doing any
 reads are writes to the index and I dont see any replication/catch up
 activity going on too at the back, yet the RAM grows consuming all 96GB
 available on each box. And all the Recovering replicas recover one by one
 in
 about an hour or so. Why is it taking so long to boot up, and what is it
 doing that is consuming so much CPU, RAM and Network IO? All disks are
 reading at 100% on all servers during this boot up. Is there are setting I
 might have missed that will help?

 FYI - The Zookeeper cluster is on the same 6 boxes.  Size of the Solr data
 dir is about 150GB per server and each box has 96GB RAM.

 Thanks,
 Anand



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud Slow to boot up

2014-09-25 Thread anand.mahajan

1. I've hosted it with Helios v 0.07 that ships with Solr 4.10
2. Change to solrconfig.xml - 
   a. commits every 10 mins
   b. soft commits every 10 secs
   c. disabled all caches as the usage is very random (no end users only
services doing the searches) and mostly single requests
   d. use cold searcher = true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098p4161132.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help needed in Indexing and Search on xml content

2014-09-25 Thread Aman Tandon

Hi,

You can retrieve the data in xml format aswell in JSON.

You need to learn about schema.xml, in this you define your fields which is
present in your xml, on which fields you want to search,etc

So it would be better to take a look at schema.xml, solr sample schema
could clear.most of doubts.
On Sep 25, 2014 5:12 PM, PeriS peri.subrahma...@htcinc.com wrote:

 Hi Sangeetha,

 If you can tell me a little bit more about your setup, I can try and help.
 If you are on skype, that would be the easiest.

 Thanks
 -Peri

 On Sep 25, 2014, at 3:50 AM, sangeetha.subraman...@gtnexus.com wrote:

  Hi Team,
 
  I am a newbie to SOLR. I have got search fields stored in a xml file
 which is stored in MSSQL. I want to index on the content of the xml file in
 SOLR. We need to provide search based on the fields present in the XML file.
 
  The reason why we are storing the input details as XML file is , the
 users will be able to add custom input fields on their own with values.
 Storing these custom fields as columns in MSSQL seems to be not an optimal
 solution. So we thought of putting it in XML file and store that file in
 RDBMS.
  But I am not sure on how we can index the content of the file to make
 search better. I believe this can be done by ExtractingRequestHandler.
 
  Could someone help me on how we can implement this/ direct me to some
 pages which could be of help to me ?
 
  Thanks
  Sangeetha
 
 
  ---
  This message has been scanned for viruses and dangerous content by HTC
 E-Mail Virus Protection Service.
 




 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
 recipient, please delete without copying and kindly advise us by e-mail of
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC
 Global Services to any order or other contract unless pursuant to explicit
 written agreement or government initiative expressly permitting the use of
 e-mail for such purpose.

Re: /suggest through SolrJ?

2014-09-25 Thread Erick Erickson

You can call anything from SolrJ that you can call from a URL.
SolrJ has lots of convenience stuff to set particular parameters,
parse the response, etc... But in the end it's communicating
with Solr via a URL.

Take a look at something like SolrQuery for instance. It has a nice
command setFacetPrefix. Here's the entire method:

public SolrQuery setFacetPrefix( String field, String prefix )
{
this.set( FacetParams.FACET_PREFIX, prefix );
return this;
}

which is really
this.set( facet.prefix, prefix );
All it's really doing is setting a SolrParams key/value
pair which is equivalent to
facet.prefix=blahblah
on a URL.

As I remember, there's a setPath method that you
can use to set the destination for the request to
suggest (or maybe /suggest). It's something like
that.

Best,
Erick


On Thu, Sep 25, 2014 at 3:47 AM, Clemens Wyss DEV clemens...@mysign.ch
wrote:

 Am I right that I cannot call /suggest (i.e. the corresponding
 RequestHandler) through SolrJ?

 What is the preferreded way to call Solr handlers/operations not
 supported by SolrJ from Java? Through new SolrJ Request-classes?

Re: Turn off suggester

2014-09-25 Thread Erick Erickson

Well, tell us more about the suggester configuration, the number
of unique terms in the field you're using, what version of Solr, etc.

As Hoss says, details matter.

Best,
Erick

On Thu, Sep 25, 2014 at 4:18 AM, PeriS peri.subrahma...@htcinc.com wrote:

 Is there a way to turn off the solr suggester? I have about 30M records
 and when tomcat starts up, it takes a long time (~10 minutes) for the
 suggester to decompress the data or its doing soothing as it hangs on
 SolrSuggester.build(); Any ideas please?

 Thanks
 -Peri



 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
 recipient, please delete without copying and kindly advise us by e-mail of
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC
 Global Services to any order or other contract unless pursuant to explicit
 written agreement or government initiative expressly permitting the use of
 e-mail for such purpose.

Re: Solr stops in between indexing

2014-09-25 Thread Erick Erickson

If it was working fine and suddenly stopped, I have to
ask what was the last thing that changed? Frankly
it sounds like your network has started having some
problems.

Best,
Erick

On Thu, Sep 25, 2014 at 6:29 AM, madhav bahuguna madhav.bahug...@gmail.com
wrote:

 Hi,
 I have solr configured on google cloud server.
 When ever i try to index it ,it stops in between and shows and error
 connection lost,connection timeout.
 I have 2200 records some time it stops full indexing at 917 sometime 1385
 sometime
 2185.
 I have apache2 running on google cloud on debian OS.
 Earlier it was working fine,it has started giving this error recently only
 Please advise and help.

 --
 Regards
 Madhav Bahuguna

Re: /suggest through SolrJ?

2014-09-25 Thread Shawn Heisey

On 9/25/2014 8:43 AM, Erick Erickson wrote:
 You can call anything from SolrJ that you can call from a URL.
 SolrJ has lots of convenience stuff to set particular parameters,
 parse the response, etc... But in the end it's communicating
 with Solr via a URL.
 
 Take a look at something like SolrQuery for instance. It has a nice
 command setFacetPrefix. Here's the entire method:
 
 public SolrQuery setFacetPrefix( String field, String prefix )
 {
 this.set( FacetParams.FACET_PREFIX, prefix );
 return this;
 }
 
 which is really
 this.set( facet.prefix, prefix );
 All it's really doing is setting a SolrParams key/value
 pair which is equivalent to
 facet.prefix=blahblah
 on a URL.
 
 As I remember, there's a setPath method that you
 can use to set the destination for the request to
 suggest (or maybe /suggest). It's something like
 that.

Yes, like Erick says, just use SolrQuery for most accesses to Solr on
arbitrary URL paths with arbitrary URL parameters.  The set method is
how you include those parameters.

The SolrQuery method Erick was talking about at the end of his email is
setRequestHandler(String), and you would set that to /suggest.  Full
disclosure about what this method actually does: it also sets the qt
parameter, but with the modern example Solr config, the qt parameter
doesn't do anything -- you must actually change the URL path on the
request, which this method will do if the value starts with a forward slash.

Thanks,
Shawn

Re: Solr and hadoop

2014-09-25 Thread Tom Chen

I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the
indexing part -- the OutputFormat part.

But what I asked for is more on the making Solr index data available to
Hadoop MapReduce -- making Solr as a data store like what HDFS can provide.
With a Solr InputFormat, we can make the Solr index data available to
Hadoop MapReduce. Along the same line, we can also make Solr index data
available to Hive, Spark and etc like what es-hadoop can do.

Best,
Tom



On Thu, Sep 25, 2014 at 10:26 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the
 Morphline stuff (check out
 https://github.com/markrmiller/solr-map-reduce-example).

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/

 On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen tomchen1...@gmail.com wrote:

  I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat
  and EsOutputFormat that are provided by Elasticserach for Hadoop
  (es-hadoop).
 
  Is it possible for Solr to provide such integration with Hadoop?
 
  Best,
  Tom

Solr mapred MTree merge stage ~6x slower in 4.10

2014-09-25 Thread Brett Hoerner

As an update to this thread, it seems my MTree wasn't completely hanging,
it was just much slower in 4.10.

If I replace 4.9.0 with 4.10 in my jar the MTree merge stage is 6x (or
more) slower (in my case, 20 min becomes 2 hours). I hope to bisect this in
the future, but the jobs I'm running take a long time. I haven't tried to
see if the issue shows on smaller jobs yet (does 1 minute become 6
minutes?).

Brett




On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner br...@bretthoerner.com
wrote:

 I have a very weird problem that I'm going to try to describe here to see
 if anyone has any ah-ha moments or clues. I haven't created a small
 reproducible project for this but I guess I will have to try in the future
 if I can't figure it out. (Or I'll need to bisect by running long Hadoop
 jobs...)

 So, the facts:

 * Have been successfully using Solr mapred to build very large Solr
 clusters for months
 * As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge
 phase in 4.10
 * Those same jobs (same input, output, and Hadoop cluster itself) succeed
 if I only change my Solr deps to 4.9
 * The job *does succeed* in 4.10 if I use the same data to create more,
 but smaller shards (e.g. 12x as many shards each 1/12th the size of the job
 that fails)
 * Creating my normal size shards (the size I want, that works in 4.9)
 the job hangs with 2 mappers running, 0 reducers in the MTree merge phase
 * There are no errors or warning in the syslog/stderr of the MTree
 mappers, no errors ever echo'd back to the interactive run of the job
 (mapper says 100%, reduce says 0%, will stay forever)
 * No CPU being used on the boxes running the merge, no GC happening, JVM
 waiting on a futex, all threads blocked on various queues
 * No disk usage problems, nothing else obviously wrong with any box in the
 cluster

 I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred
 contrib, mostly some test stuff. I didn't see any transitive dependency
 changes in Solr/Lucene that look like they would affect me.

Re: Solr and hadoop

2014-09-25 Thread Joel Bernstein

Hi Tom,

I am not aware of a Solr InputFormat implementation yet. The /export
handier, which outputs entire sorted results sets, was designed to support
these types of bulk export operations efficiently. I think a Solr
InputFormat would be excellent project to begin working on.

Also SOLR-6526 is underway to provide SolrCloud with native streaming
aggregation capabilities.


Joel Bernstein
Search Engineer at Heliosearch

On Thu, Sep 25, 2014 at 12:34 PM, Tom Chen tomchen1...@gmail.com wrote:

 I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the
 indexing part -- the OutputFormat part.

 But what I asked for is more on the making Solr index data available to
 Hadoop MapReduce -- making Solr as a data store like what HDFS can provide.
 With a Solr InputFormat, we can make the Solr index data available to
 Hadoop MapReduce. Along the same line, we can also make Solr index data
 available to Hive, Spark and etc like what es-hadoop can do.

 Best,
 Tom



 On Thu, Sep 25, 2014 at 10:26 AM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the
  Morphline stuff (check out
  https://github.com/markrmiller/solr-map-reduce-example).
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  
  w: appinions.com http://www.appinions.com/
 
  On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen tomchen1...@gmail.com wrote:
 
   I wonder if Solr has InputFormat and OutputFormat like the
 EsInputFormat
   and EsOutputFormat that are provided by Elasticserach for Hadoop
   (es-hadoop).
  
   Is it possible for Solr to provide such integration with Hadoop?
  
   Best,
   Tom

Why does the q parameter change?

2014-09-25 Thread eShard

Good afternoon all,
I just implemented a phrase search and the parsed query gets changed from
rapid prototyping to rapid prototype. 
I used the solr analyzer and prototyping was unchanged so I think I ruled
out a tokenizer.
So can anyone tell me what's going on?
Here's the query:
q=rapid prototypingdefType=edismaxqf=textpf2=text^40ps=0

here's the debugger:
as you can see; prototyping gets changed to just prototype. What's causing
this and how do I turn it off?
Thanks,

lst name=debug
lst name=queryBoosting
str name=qrapid prototyping/str
null name=match//lst
str name=rawquerystringrapid prototyping/strstr
name=querystringrapid prototyping/str
str name=parsedquery(+((DisjunctionMaxQuery((text:rapid))
DisjunctionMaxQuery((text:prototype)))~2) DisjunctionMaxQuery((text:rapid
prototype^40.0)))/no_coord/str
str name=parsedquery_toString+(((text:rapid) (text:prototype))~2)
(text:rapid prototype^40.0)/str
str name=QParserExtendedDismaxQParser/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Why does the q parameter change?

2014-09-25 Thread eShard

Ok, I think I'm on to something.
I omitted this parameter which means it is set to false by default on my
text field.
I need to set it to true and see what happens...
autoGeneratePhraseQueries=true
If I'm reading the wiki right, this parameter if true will preserve phrase
queries...





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161185.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Turn off suggester

2014-09-25 Thread Alexandre Rafalovitch

Isn't it one of the Solr components? Can it be just removed from the
default chain? Random poking in the dark here.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 25 September 2014 10:45, Erick Erickson erickerick...@gmail.com wrote:
 Well, tell us more about the suggester configuration, the number
 of unique terms in the field you're using, what version of Solr, etc.

 As Hoss says, details matter.

 Best,
 Erick

 On Thu, Sep 25, 2014 at 4:18 AM, PeriS peri.subrahma...@htcinc.com wrote:

 Is there a way to turn off the solr suggester? I have about 30M records
 and when tomcat starts up, it takes a long time (~10 minutes) for the
 suggester to decompress the data or its doing soothing as it hangs on
 SolrSuggester.build(); Any ideas please?

 Thanks
 -Peri



 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
 recipient, please delete without copying and kindly advise us by e-mail of
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC
 Global Services to any order or other contract unless pursuant to explicit
 written agreement or government initiative expressly permitting the use of
 e-mail for such purpose.

Re: Turn off suggester

2014-09-25 Thread Tomás Fernández Löbbe

The SuggestComponent is not in the default components list. There must be a
request handler with this component added explicitly in the solrconfig.xml

Tomás

On Thu, Sep 25, 2014 at 12:22 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Isn't it one of the Solr components? Can it be just removed from the
 default chain? Random poking in the dark here.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 25 September 2014 10:45, Erick Erickson erickerick...@gmail.com
 wrote:
  Well, tell us more about the suggester configuration, the number
  of unique terms in the field you're using, what version of Solr, etc.
 
  As Hoss says, details matter.
 
  Best,
  Erick
 
  On Thu, Sep 25, 2014 at 4:18 AM, PeriS peri.subrahma...@htcinc.com
 wrote:
 
  Is there a way to turn off the solr suggester? I have about 30M records
  and when tomcat starts up, it takes a long time (~10 minutes) for the
  suggester to decompress the data or its doing soothing as it hangs on
  SolrSuggester.build(); Any ideas please?
 
  Thanks
  -Peri
 
 
 
  *** DISCLAIMER *** This is a PRIVATE message. If you are not the
 intended
  recipient, please delete without copying and kindly advise us by e-mail
 of
  the mistake in delivery.
  NOTE: Regardless of content, this e-mail shall not operate to bind HTC
  Global Services to any order or other contract unless pursuant to
 explicit
  written agreement or government initiative expressly permitting the use
 of
  e-mail for such purpose.

Re: Help needed in Indexing and Search on xml content

2014-09-25 Thread Alexandre Rafalovitch

Have a look at data import handler and you'll need to use nested entities.
That should get you at least a demo. Then you can decide whether that's
good enough.

Regards,
 Alex
On 25/09/2014 3:51 am, sangeetha.subraman...@gtnexus.com 
sangeetha.subraman...@gtnexus.com wrote:

 Hi Team,

 I am a newbie to SOLR. I have got search fields stored in a xml file which
 is stored in MSSQL. I want to index on the content of the xml file in SOLR.
 We need to provide search based on the fields present in the XML file.

 The reason why we are storing the input details as XML file is , the users
 will be able to add custom input fields on their own with values. Storing
 these custom fields as columns in MSSQL seems to be not an optimal
 solution. So we thought of putting it in XML file and store that file in
 RDBMS.
 But I am not sure on how we can index the content of the file to make
 search better. I believe this can be done by ExtractingRequestHandler.

 Could someone help me on how we can implement this/ direct me to some
 pages which could be of help to me ?

 Thanks
 Sangeetha

Re: Why does the q parameter change?

2014-09-25 Thread eShard

No, apparently it's the KStemFilter.
should I turn this off at query time?
I'll put this in another question...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Best practice for KStemFilter query or index or both?

2014-09-25 Thread eShard

Good afternoon,
Here's my configuration for a text field.
I have the same configuration for index and query time.
Is this valid? 
What's the best practice for these query or index or both?
for synonyms; I've read conflicting reports on when to use it but I'm
currently changing it over to at indexing time only.

Thanks,

fieldType name=text_general class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /  
  /analyzer
  analyzer type=select
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /  
  /analyzer
/fieldType





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Best practice for KStemFilter query or index or both?

2014-09-25 Thread Markus Jelsma

Hi - most filters should be used both sides, especially stemmers, accent 
foldings and obviously lowercasing. Synonyms only on one side, depending on how 
you want to utilize them.

Markus

 
 
-Original message-
 From:eShard zim...@yahoo.com
 Sent: Thursday 25th September 2014 22:23
 To: solr-user@lucene.apache.org
 Subject: Best practice for KStemFilter query or index or both?
 
 Good afternoon,
 Here's my configuration for a text field.
 I have the same configuration for index and query time.
 Is this valid? 
 What's the best practice for these query or index or both?
 for synonyms; I've read conflicting reports on when to use it but I'm
 currently changing it over to at indexing time only.
 
 Thanks,
 
 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1
 catenateWords=0
 catenateNumbers=0
 catenateAll=0
 preserveOriginal=1
 /
   filter class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.KStemFilterFactory /
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1
 catenateWords=0
 catenateNumbers=0
 catenateAll=0
 preserveOriginal=1
 /
   filter class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.KStemFilterFactory /  
   /analyzer
   analyzer type=select
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1
 catenateWords=0
 catenateNumbers=0
 catenateAll=0
 preserveOriginal=1
 /
   filter class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.KStemFilterFactory /  
   /analyzer
 /fieldType
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

2014-09-25 Thread Diego Fernandez

The difference comes in the fact that when you query the same form it matches 2 
tokens including the less common one.  When you query a different form you only 
match on the more common form.  So really you're getting the boost from both 
the tiny difference in TF*IDF and the extra token that you match on.

However, I agree that adding a payload might be a better solution.

- Original Message -
 Hi - but this makes no sense, they are scored as equals, except for tiny
 differences in TF and IDF. What you would need is something like a stemmer
 that preserves the original token and gives a  1 payload to the stemmed
 token. The same goes for filters like decompounders and accent folders that
 change meaning of words.
  
  
 -Original message-
  From:Diego Fernandez difer...@redhat.com
  Sent: Wednesday 17th September 2014 23:37
  To: solr-user@lucene.apache.org
  Subject: Re: How does KeywordRepeatFilterFactory help giving a higher score
  to an original term vs a stemmed term
  
  I'm not 100% on this, but I imagine this is what happens:
  
  (using - to mean tokenized to)
  
  Suppose that you index:
  
  I am running home - am run running home
  
  If you then query running home - run running home and thus give a
  higher score than if you query runs home - run runs home
  
  
  - Original Message -
   The Solr wiki says   A repeated question is how can I have the
   original term contribute
   more to the score than the stemmed version? In Solr 4.3, the
   KeywordRepeatFilterFactory has been added to assist this
   functionality. 
   
   https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
   
   (Full section reproduced below.)
   I can see how in the example from the wiki reproduced below that both
   the stemmed and original term get indexed, but I don't see how the
   original term gets more weight than the stemmed term.  Wouldn't this
   require a filter that gives terms with the keyword attribute more
   weight?
   
   What am I missing?
   
   Tom
   
   
   
   -
   A repeated question is how can I have the original term contribute
   more to the score than the stemmed version? In Solr 4.3, the
   KeywordRepeatFilterFactory has been added to assist this
   functionality. This filter emits two tokens for each input token, one
   of them is marked with the Keyword attribute. Stemmers that respect
   keyword attributes will pass through the token so marked without
   change. So the effect of this filter would be to index both the
   original word and the stemmed version. The 4 stemmers listed above all
   respect the keyword attribute.
   
   For terms that are not changed by stemming, this will result in
   duplicate, identical tokens in the document. This can be alleviated by
   adding the RemoveDuplicatesTokenFilterFactory.
   
   fieldType name=text_keyword class=solr.TextField
   positionIncrementGap=100
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.KeywordRepeatFilterFactory/
  filter class=solr.PorterStemFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
   /fieldType
   
  
  --
  Diego Fernandez - 爱国
  Software Engineer
  GSS - Diagnostics
  
  
 

-- 
Diego Fernandez - 爱国
Software Engineer
GSS - Diagnostics

IRC: aiguofer on #gss and #customer-platform

Re: point buffer returned as an elipse, how to configure?

2014-09-25 Thread david.w.smi...@gmail.com

Hi Mark,

I asked a follow-up question/observation to your Stackoverflow
instantiation of your question.

I also wrote the following, which doesn’t yet fit into an answer because I
don’t know what problem you are yet experiencing:

Some technical details: geo=true|false is an attribute on the field type;
it isn't a request parameter.  Should you want to change it to geo=false,
you will also have to set the worldBounds and certainly re-index.  Almost
any change to a field type in the schema requires a re-index.  If your
units stay degrees then you can continue to use lat,lon format, but if
you use another unit specific to some projection then it's not degrees and
I suggest switching to x y to avoid confusion with lat,lon format.  FYI
units=degrees is required but it has no effect.  When geo=true, the 'd' in
geofilt is kilometers, when geo=false, 'd' is in the units of the numbers
you put into the index.  The docs are here:

https://cwiki.apache.org/confluence/display/solr/Spatial+Search


It would be awesome if you want to help further spatial in Lucene/Solr.
This year is looking like a great year for spatial — I’m particularly
excited about a new “FlexPrefixTree” from Varun (GSOC 2014) together with
the latest advances in auto-prefixing to be released in Lucene/Solr 5.0.


~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Sep 25, 2014 at 8:42 AM, Mark G ma...@apache.org wrote:

 Solr team, I am indexing geographic points in dec degrees lat lon using the
 location_rpt type in my index. The type is setup like this

  fieldType name=location_rpt
 class=solr.SpatialRecursivePrefixTreeFieldType
 geo=true distErrPct=0.025 maxDistErr=0.09 units=degrees
  /

 my field definition is this

 field name=pointGeom_rpt type=location_rpt indexed=true
 stored=true  multiValued=false/

 my problem is the return is a very narrow but tall ellipse likely due
 to the degrees and geo  true... but when I change those params to
 geo=false...the index won't start
 this is the query I am using

  String query =
 
 http://myserver:8983/solr/mycore/select?q=*:*fq={!geofilt}sfield=pointGeom_rptpt=
 
 + lat + , + lon + d= + distance +
 wt=jsonindent=truegeo=truerows= + rows;



 I am not using solr cloud, and I am on version 4.8.0

 I also opened up this stackoverflow question... it has some more details
 and a picture of the return I get


 http://stackoverflow.com/questions/25996820/why-is-solr-spatial-buffer-returned-as-an-elipse


 BTW, I'm an OpenNLP committer and I am very geospatially focused, let me
 know if you want help with anything geo, I'll try to carve out some time if
 needed.

 thanks
 G$

AW: /suggest through SolrJ?

2014-09-25 Thread Clemens Wyss DEV

Thx to you two.

Just in case anybody else is trying to do this. The following SolrJ code 
corresponds to the http request
GET http://localhost:8983/solr/solrpedia/suggest?q=atmo
of  Solr in Action (chapter 10):
...
SolrServer server = new HttpSolrServer(http://localhost:8983/solr/solrpedia;);
SolrQuery query = new SolrQuery( atmo );
query.setRequestHandler( /suggest );
QueryResponse queryresponse = server.query( query );
...
queryresponse.getSpellCheckResponse().getSuggestions();
...


-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:s...@elyograg.org] 
Gesendet: Donnerstag, 25. September 2014 17:37
An: solr-user@lucene.apache.org
Betreff: Re: /suggest through SolrJ?

On 9/25/2014 8:43 AM, Erick Erickson wrote:
 You can call anything from SolrJ that you can call from a URL.
 SolrJ has lots of convenience stuff to set particular parameters, 
 parse the response, etc... But in the end it's communicating with Solr 
 via a URL.
 
 Take a look at something like SolrQuery for instance. It has a nice 
 command setFacetPrefix. Here's the entire method:
 
 public SolrQuery setFacetPrefix( String field, String prefix ) {
 this.set( FacetParams.FACET_PREFIX, prefix );
 return this;
 }
 
 which is really
 this.set( facet.prefix, prefix ); All it's really doing is 
 setting a SolrParams key/value pair which is equivalent to 
 facet.prefix=blahblah on a URL.
 
 As I remember, there's a setPath method that you can use to set the 
 destination for the request to suggest (or maybe /suggest). It's 
 something like that.

Yes, like Erick says, just use SolrQuery for most accesses to Solr on arbitrary 
URL paths with arbitrary URL parameters.  The set method is how you include 
those parameters.

The SolrQuery method Erick was talking about at the end of his email is 
setRequestHandler(String), and you would set that to /suggest.  Full 
disclosure about what this method actually does: it also sets the qt
parameter, but with the modern example Solr config, the qt parameter doesn't do 
anything -- you must actually change the URL path on the request, which this 
method will do if the value starts with a forward slash.

Thanks,
Shawn

42 matches

Mail list logo