date:20150318

Hi,
If I do very very fast indexing(softcommit = 300 and hardcommit =
3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you
both said. Will fast indexing fail to index some data?
Any suggestion on this ?

On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:

Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote:

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi Erick,
You are saying correct. Something, **overlapping
searchers
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the time
of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very, very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But the short form is that you're opening searchers every 300 ms. The
hard commit is better,
but every 3 seconds is still far too short IMO. I'd start with soft
commits of 6 and hard
commits of 6 (60 seconds), meaning that you're going to have to
wait 1 minute for
docs to show up unless you explicitly commit.

You're throwing away all the caches configured in solrconfig.xml more
than 3 times a second,
executing autowarming, etc, etc, etc

Changing these to longer intervals might cure the problem, but if not
then, as Hoss would
say, details matter. I suspect you're also seeing overlapping
searchers warning messages
in your log, and it;s _possible_ that what's happening is that you're
just exceeding the
max warming searchers and never opening a new searcher with the
newly-indexed documents.
But that's a total shot in the dark.

How are you looking for docs (and not finding them)? Does the numDocs
number in
the solr admin screen change?

Best,
Erick

On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com

wrote:
Hi Alexandre,

*Hard Commit* is :

autoCommit
maxTime${solr.autoCommit.maxTime:3000}/maxTime
openSearcherfalse/openSearcher
/autoCommit

*Soft Commit* is :

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:300}/maxTime
/autoSoftCommit

And I am committing 2 documents each time.
Is it good config for committing?
Or I am good something wrong ?

On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

What's your commit strategy? Explicit commits? Soft commits/hard
commits (in solrconfig.xml)?

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com
wrote:
Hello,
I have written a python script to do 2 documents
indexing
each time on Solr. I have 28 GB RAM with 8 CPU.
When I started indexing, at that time 15 GB RAM was freed. While
indexing,
all RAM is consumed but **not** a single document is indexed. Why
so?
And it through *HTTPError: HTTP Error 503: Service Unavailable*
in
python
script.
I think it is due to heavy load on Zookeeper by which all nodes
went
down.
I am not sure about that. Any help please..
Or anything else is happening..
And how to overcome this issue.
Please assist me towards right path.
Thanks..

Warm Regards,
Nitin Solanki

Re: Add replica on shards

2015-03-18 Thread Norgorn


U can do the same simply by something like that

http://localhost:8983/solr/admin/cores?action=CREATEcollection=wikingramname=ANY_NAME_HEREshard=shard1

The main part is shard=shard1, when you create core with existing shard
(core name doesn't matter, we use collection_shard1_replica2, but u can do
whatever u want), this core becomes a replica and copies data from leading
shard.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Add-replica-on-shards-tp4193659p4193732.html
Sent from the Solr - User mailing list archive at Nabble.com.

schema.xml xsd file

2015-03-18 Thread Pedro Figueiredo

Hello,

 

Where can I find the xsd file for the schema.xml file?

 

Thanks in advanced!

Best regards,

 


Pedro Figueiredo
Senior Engineer

 mailto:pjlfigueir...@criticalsoftware.com
pjlfigueir...@criticalsoftware.com
M. 934058150


 




Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
 http://www.criticalsoftware.com/ www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
 http://cmmiinstitute.com/ A CMMI® LEVEL 5 RATED COMPANY CMMI® is
registered in the USPTO by  http://www.cmu.edu/ CMU

Re: index duplicate records from data source into 1 document

I'd use SolrJ, pull the docs by productId order and combine records
with the same product ID into a single doc.

Here's a starter set for indexing form a DB with SolrJ. It has Tika
processing in it as well, but you can pull that out pretty easily.

https://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh d...@globalsources.com wrote:
 Hi

 If I have duplicaterecords in my source data (DB or delimited files). For
 simplicity sake they are of the following nature

 Product IdBusiness Type
 ---
 12345 Exporter
 12345 Agent
 12366 Manufacturer
 12377 Exporter
 12377 Distributor

 There are other fields with multiple values as well.

 How do I index theduplicate records into 1 document. Eg. Product Id 12345
 will be 1 document,12366 as 1 document and 12377 as 1 document.

 -Derek

Re: Whole RAM consumed while Indexing.

Don't do it. Really, why do you want to do this? This seems like
an XY problem, you haven't explained why you need to commit
things so very quickly.

I suspect you haven't tried _searching_ while committing at such
a rate, and you might as well turn all your top-level caches off
in solrconfig.xml since they won't be useful at all.

Best,
Erick

On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote:
Hi,
If I do very very fast indexing(softcommit = 300 and hardcommit =
3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you
both said. Will fast indexing fail to index some data?
Any suggestion on this ?

On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:

Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote:

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very, very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

You're throwing away all the caches configured in solrconfig.xml more
than 3 times a second,
executing autowarming, etc, etc, etc

How are you looking for docs (and not finding them)? Does the numDocs
number in
the solr admin screen change?

Best,
Erick

On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com

wrote:
Hi Alexandre,

*Hard Commit* is :

autoCommit
maxTime${solr.autoCommit.maxTime:3000}/maxTime
openSearcherfalse/openSearcher
/autoCommit

*Soft Commit* is :

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:300}/maxTime
/autoSoftCommit

And I am committing 2 documents each time.
Is it good config for committing?
Or I am good something wrong ?

On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

What's your commit strategy? Explicit commits? Soft commits/hard
commits (in solrconfig.xml)?

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

Warm Regards,
Nitin Solanki

Re: Add replica on shards

Thanks Norgorn.
I did the same thing but in different manner..
like -

localhost:8983/solr/admin/cores?action=CREATEname=wikingram_shard4_replica3collection=wikingramproperty.shard=shard4

On Wed, Mar 18, 2015 at 7:20 PM, Norgorn lsunnyd...@mail.ru wrote:


 U can do the same simply by something like that


 http://localhost:8983/solr/admin/cores?action=CREATEcollection=wikingramname=ANY_NAME_HEREshard=shard1

 The main part is shard=shard1, when you create core with existing shard
 (core name doesn't matter, we use collection_shard1_replica2, but u can
 do
 whatever u want), this core becomes a replica and copies data from leading
 shard.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Add-replica-on-shards-tp4193659p4193732.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: data is present on one shard only

On 3/17/2015 3:54 AM, Aman Tandon wrote:
 I indexed the data in my SolrCoud architecture (2 shards present on 2
 separate instance  on one instance I have the replica of both the shards
 which is present on other 2 instance).

 And when I am looking at the index via admin interface, it is present on a
 single instance.  Isn't the data should be present on both the shards.

The question here is not clear, at least to me.  What exactly are you
looking at in the admin UI, what are you seeing, and what are you
expecting to see?

Thanks,
Shawn

Re: Unable to index rich-text documents in Solr Cloud

Shot in the dark, but is the PDF file significantly larger than the
others? Perhaps your simply exceeding the packet limits for the
servlet container?

Best,
Erick

On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 Hi everyone,

 I'm having some issues with indexing rich-text documents from the Solr
 Cloud. When I tried to index a pdf or word document, I get the following
 error:


 org.apache.solr.common.SolrException: Bad Request



 request: 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)


 I'm able to index .xml and .csv files in Solr Cloud with the same 
 configuration.

 I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and
 I have 2 shards with the following details:
 Shard1: 192.168.2.2:8983
 Shard2: 192.168.2.2:8984

 Prior to this, I'm already able to index rich-text documents without
 the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
 so my ExtractRequestHandler is already defined.

 Is there other settings required in order to index rich-text documents
 in Solr Cloud?


 Regards,
 Edwin

Re: Whole RAM consumed while Indexing.

On 3/18/2015 9:44 AM, Nitin Solanki wrote:
  I am just saying. I want to be sure on commits difference..
 What if I do frequent commits or not? And why I am saying that I need to
 commit things so very quickly because I have to index 28GB of data which
 takes 7-8 hours(frequent commits).
 As you said, do commits after 6 seconds then it will be more expensive.
 If I don't encounter with **overlapping searchers warning messages** then
 I feel it seems to be okay. Is it?

Even if the commit only handles a single document and it's a soft
commit, it is an expensive operation in terms of CPU, and in a
garbage-collected environment like Java, memory churn as well.  A commit
also invalidates the Solr caches, so if you have autowarming turned on,
then you have the additional overhead of doing a bunch of queries to
warm the new cache - on every single soft commit.

Doing commits as often as three times a second (you did say the interval
was 300 milliseconds) is generally a bad idea.  Increasing the interval
to once a minute will take a huge amount of load off of your servers, so
indexing will happen faster.

Thanks,
Shawn

Multiple words suggestion

2015-03-18 Thread Hakim Benoudjit

Hello there,

Does Solr 4.x (or even 5) support *multiple words suggestions*?
I mean if my query is: *tozota hilox*:
And when I activate the spellcheck component, each word is treated
separately.

So *toyota* is suggested for *tozota*, and *hilux* is suggested for 
*hilox*.
But what I need to have is a complete suggestion for all the query:
i.e. *toyota
hilux* which will be suggested when the user's query is *tozota hilox*.

Please see below the *suggest component *from my *solrconfig.xml *(I
changed only the *field*):










































*searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetext_general/str!-- Multiple
Spell Checkers can be declared and used by this component
--!-- a spellchecker built from a field of the main index --lst
name=spellchecker  str name=namedefault/str  str
name=fieldrecherche/str  str
name=classnamesolr.DirectSolrSpellChecker/str  !-- the spellcheck
distance measure used, the default is the internal levenshtein --
str name=distanceMeasureinternal/str  !-- minimum accuracy
needed to be considered a valid spellcheck suggestion --  float
name=accuracy0.5/float  !-- the maximum #edits we consider when
enumerating terms: can be 1 or 2 --  int name=maxEdits2/int
!-- the minimum shared prefix when enumerating terms --  int
name=minPrefix1/int  !-- maximum number of inspections per
result. --  int name=maxInspections5/int  !-- minimum
length of a query term to be considered for correction --  int
name=minQueryLength4/int  !-- maximum threshold of documents a
query term can appear to be considered for correction --  float
name=maxQueryFrequency0.01/float  !-- uncomment this to require
suggestions to occur in 1% of the documents  float
name=thresholdTokenFrequency.01/float  --/lst!-- a
spellchecker that can break or combine words.  See /spell handler below
for usage --lst name=spellchecker  str
name=namewordbreak/str  str
name=classnamesolr.WordBreakSolrSpellChecker/strstr
name=fieldrecherche/str  str name=combineWordstrue/str
str name=breakWordstrue/str  int name=maxChanges10/int
/lst*
*/searchComponent*

-- 
Cordialement,
Best regards,
Hakim Benoudjit

Re: SolrCloud: data is present on one shard only

2015-03-18 Thread Aman Tandon

Hi Shawn,

I apologize for my unclear mail,

I have 1,20,000 documents. And I am indexing the data in my solrcloud
architecture having the two shards. I am having the mindset that some data
will be present on both the shards. But when I am looking on data size via
admin interface, I am able to see that all the documents is present on only
one shard and another shard has zero documents.

So I am confused and wants to confirm that, does I am doing something wrong?

With Regards
Aman Tandon

On Wed, Mar 18, 2015 at 7:26 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/17/2015 3:54 AM, Aman Tandon wrote:
  I indexed the data in my SolrCoud architecture (2 shards present on 2
  separate instance  on one instance I have the replica of both the shards
  which is present on other 2 instance).
 
  And when I am looking at the index via admin interface, it is present on
 a
  single instance.  Isn't the data should be present on both the shards.

 The question here is not clear, at least to me.  What exactly are you
 looking at in the admin UI, what are you seeing, and what are you
 expecting to see?

 Thanks,
 Shawn

Re: Whole RAM consumed while Indexing.

Hi Erick,
I am just saying. I want to be sure on commits difference..
What if I do frequent commits or not? And why I am saying that I need to
commit things so very quickly because I have to index 28GB of data which
takes 7-8 hours(frequent commits).
As you said, do commits after 6 seconds then it will be more expensive.
If I don't encounter with **overlapping searchers warning messages** then
I feel it seems to be okay. Is it?

On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com
wrote:

Don't do it. Really, why do you want to do this? This seems like
an XY problem, you haven't explained why you need to commit
things so very quickly.

I suspect you haven't tried _searching_ while committing at such
a rate, and you might as well turn all your top-level caches off
in solrconfig.xml since they won't be useful at all.

Best,
Erick

On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi,
If I do very very fast indexing(softcommit = 300 and hardcommit =
3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as
you
both said. Will fast indexing fail to index some data?
Any suggestion on this ?

On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:

Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote:

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com

wrote:
Hi Erick,
You are saying correct. Something, **overlapping
searchers
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the
time
of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very,
very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But the short form is that you're opening searchers every 300 ms.
The
hard commit is better,
but every 3 seconds is still far too short IMO. I'd start with soft
commits of 6 and hard
commits of 6 (60 seconds), meaning that you're going to have to
wait 1 minute for
docs to show up unless you explicitly commit.

You're throwing away all the caches configured in solrconfig.xml
more
than 3 times a second,
executing autowarming, etc, etc, etc

Changing these to longer intervals might cure the problem, but if
not
then, as Hoss would
say, details matter. I suspect you're also seeing overlapping
searchers warning messages
in your log, and it;s _possible_ that what's happening is that
you're
just exceeding the
max warming searchers and never opening a new searcher with the
newly-indexed documents.
But that's a total shot in the dark.

How are you looking for docs (and not finding them)? Does the
numDocs
number in
the solr admin screen change?

Best,
Erick

On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki
nitinml...@gmail.com

wrote:
Hi Alexandre,

*Hard Commit* is :

autoCommit
maxTime${solr.autoCommit.maxTime:3000}/maxTime
openSearcherfalse/openSearcher
/autoCommit

*Soft Commit* is :

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:300}/maxTime
/autoSoftCommit

And I am committing 2 documents each time.
Is it good config for committing?
Or I am good something wrong ?

On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

What's your commit strategy? Explicit commits? Soft commits/hard
commits (in solrconfig.xml)?

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

Re: SolrCloud: data is present on one shard only

On 3/18/2015 9:47 AM, Aman Tandon wrote:
 I have 1,20,000 documents. And I am indexing the data in my solrcloud
 architecture having the two shards. I am having the mindset that some data
 will be present on both the shards. But when I am looking on data size via
 admin interface, I am able to see that all the documents is present on only
 one shard and another shard has zero documents.

 So I am confused and wants to confirm that, does I am doing something wrong?

In your admin UI, click on Cloud and then Tree.  You should see a
/collections entry in the list.  Open that, and then click on the
collection you are concerned about.

In the right side of that window, there will be a bunch of fields with
values.  Below that will be a small snippet of JSON text, and one of the
bits of info in that JSON will be a field called router ... what is
router set to?

If it is implicit then your documents will not be automatically
dispersed across your shards when you index.  They will be indexed into
the shard that received your indexing requests.  You will need to create
a new collection where the router is compositeId.

Thanks,
Shawn

Re: Solr returns incorrect results after sorting

2015-03-18 Thread jim ferenczi

Hi Raj,
The group.sort you are using defines multiple criterias. The first criteria
is the big solr function starting with the max. This means that inside
each group the documents will be sorted by this criteria and if the values
are equals between two documents then the comparison fallbacks to the
second criteria (inStock_boolean desc) and so on.

*Even though if i add price asc in the group.sort, but still the main
sort does not consider that.*
The main sort does not have to consider what's in the group.sort. The
group.sort defines the way the documents are sorted inside each group. So
if you want to sort the document inside each group with the same order than
in the main sort you can remove the group.sort or you can have a primary
sort on pricecommon_double desc in your group.sort:
*group.sort=pricecommon_double
desc, 
max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0))
desc,inStock_boolean
desc,geodist() asc*


Cheers,
Jim



2015-03-18 7:28 GMT+01:00 kumarraj rajitpro2...@gmail.com:

 Hi Jim,

 Yes, you are right.. that document is having price 499.99,
 But i want to consider the first record in the group as part of the main
 sort.
 Even though if i add price asc in the group.sort, but still the main sort
 does not consider that.

 group.sort=max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0))
 desc,inStock_boolean descgeodist() asc,pricecommon_double
 ascsort=pricecommon_double desc

 Is there any other workaround so that sort is always based on the first
 record which is pulled up in each group?


 Regards,
 Raj



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4193658.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Whole RAM consumed while Indexing.

When I kept my configuration to 300 for soft commit and 3000 for hard
commit and indexed some amount of data, I got the data size of the whole
index to be 6GB after completing the indexing.

When I changed the configuration to 6 for soft commit and 6 for
hard commit and indexed same data then I got the data size of the whole
index to be 5GB after completing the indexing.

But the number of documents in the both scenario were same. I am wondering
how that can be possible?

On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote:

On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com
wrote:

Don't do it. Really, why do you want to do this? This seems like
an XY problem, you haven't explained why you need to commit
things so very quickly.

I suspect you haven't tried _searching_ while committing at such
a rate, and you might as well turn all your top-level caches off
in solrconfig.xml since they won't be useful at all.

Best,
Erick

On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi,
If I do very very fast indexing(softcommit = 300 and hardcommit =
3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as
you
both said. Will fast indexing fail to index some data?
Any suggestion on this ?

On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:

Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com
wrote:

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki
nitinml...@gmail.com
wrote:
Hi Erick,
You are saying correct. Something, **overlapping
searchers
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the
time
of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very,
very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But the short form is that you're opening searchers every 300 ms.
The
hard commit is better,
but every 3 seconds is still far too short IMO. I'd start with
soft
commits of 6 and hard
commits of 6 (60 seconds), meaning that you're going to have
to
wait 1 minute for
docs to show up unless you explicitly commit.

You're throwing away all the caches configured in solrconfig.xml
more
than 3 times a second,
executing autowarming, etc, etc, etc

Changing these to longer intervals might cure the problem, but if
not
then, as Hoss would
say, details matter. I suspect you're also seeing overlapping
searchers warning messages
in your log, and it;s _possible_ that what's happening is that
you're
just exceeding the
max warming searchers and never opening a new searcher with the
newly-indexed documents.
But that's a total shot in the dark.

How are you looking for docs (and not finding them)? Does the
numDocs
number in
the solr admin screen change?

Best,
Erick

On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki
nitinml...@gmail.com

wrote:
Hi Alexandre,

*Hard Commit* is :

autoCommit
maxTime${solr.autoCommit.maxTime:3000}/maxTime
openSearcherfalse/openSearcher
/autoCommit

*Soft Commit* is :

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:300}/maxTime
/autoSoftCommit

And I am committing 2 documents each time.
Is it good config for committing?
Or I am good something wrong ?

On Fri, Mar 13, 2015 at 8:52 AM,

Re: SolrCloud: data is present on one shard only

2015-03-18 Thread Aman Tandon

Okay shawn thanks I will try as per your suggestion. And will update here.

With Regards
Aman Tandon

On Wed, Mar 18, 2015 at 9:39 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/18/2015 9:47 AM, Aman Tandon wrote:
  I have 1,20,000 documents. And I am indexing the data in my solrcloud
  architecture having the two shards. I am having the mindset that some
 data
  will be present on both the shards. But when I am looking on data size
 via
  admin interface, I am able to see that all the documents is present on
 only
  one shard and another shard has zero documents.
 
  So I am confused and wants to confirm that, does I am doing something
 wrong?

 In your admin UI, click on Cloud and then Tree.  You should see a
 /collections entry in the list.  Open that, and then click on the
 collection you are concerned about.

 In the right side of that window, there will be a bunch of fields with
 values.  Below that will be a small snippet of JSON text, and one of the
 bits of info in that JSON will be a field called router ... what is
 router set to?

 If it is implicit then your documents will not be automatically
 dispersed across your shards when you index.  They will be indexed into
 the shard that received your indexing requests.  You will need to create
 a new collection where the router is compositeId.

 Thanks,
 Shawn

Re: schema.xml xsd file

There isn't one. The question has ben bandied back and forth several times,
but
the reaction is that an XSD would be more trouble than it's worth,
especially as
it would have to handle any customizations that anyone wanted to throw at,
say,
custom field types.

Best,
Erick

On Wed, Mar 18, 2015 at 7:45 AM, Pedro Figueiredo 
pjlfigueir...@criticalsoftware.com wrote:

 Hello,



 Where can I find the xsd file for the schema.xml file?



 Thanks in advanced!

 Best regards,



 *Pedro Figueiredo*
 Senior Engineer

 pjlfigueir...@criticalsoftware.com
 M. 934058150



 [image: CRITICAL Software]

 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
 T. +351 229 446 927 | F. +351 229 446 929
 www.criticalsoftware.com

 PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
 A CMMI® LEVEL 5 RATED COMPANY  http://cmmiinstitute.com/CMMI® is
 registered in the USPTO by CMU http://www.cmu.edu/

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Alexandre Rafalovitch

Probably merged somewhat differently with some terms indexes repeating
between segments. Check the number of segments in data directory.And
do search for *:* and make sure both do have the same document counts.

Also, In all these discussions, you still haven't answered about how
fast after indexing you want to _search_? Because, if you are not
actually searching while committing, you could even index on a
completely separate server (e.g. a faster one) and swap (or alias)
index in afterwards. Unless, of course, I missed it, it's a lot of
emails in a very short window of time.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 18 March 2015 at 12:09, Nitin Solanki nitinml...@gmail.com wrote:
When I kept my configuration to 300 for soft commit and 3000 for hard
commit and indexed some amount of data, I got the data size of the whole
index to be 6GB after completing the indexing.

When I changed the configuration to 6 for soft commit and 6 for
hard commit and indexed same data then I got the data size of the whole
index to be 5GB after completing the indexing.

But the number of documents in the both scenario were same. I am wondering
how that can be possible?

On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote:

On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com
wrote:

Don't do it. Really, why do you want to do this? This seems like
an XY problem, you haven't explained why you need to commit
things so very quickly.

I suspect you haven't tried _searching_ while committing at such
a rate, and you might as well turn all your top-level caches off
in solrconfig.xml since they won't be useful at all.

Best,
Erick

On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi,
If I do very very fast indexing(softcommit = 300 and hardcommit =
3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as
you
both said. Will fast indexing fail to index some data?
Any suggestion on this ?

On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:

Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com
wrote:

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki
nitinml...@gmail.com
wrote:
Hi Erick,
You are saying correct. Something, **overlapping
searchers
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the
time
of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very,
very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But the short form is that you're opening searchers every 300 ms.
The
hard commit is better,
but every 3 seconds is still far too short IMO. I'd start with
soft
commits of 6 and hard
commits of 6 (60 seconds), meaning that you're going to have
to
wait 1 minute for
docs to show up unless you explicitly commit.

You're throwing away all the caches configured in solrconfig.xml
more
than 3 times a second,
executing autowarming, etc, etc, etc

Changing these to longer intervals might cure the problem, but if
not
then, as Hoss would
say, details matter. I suspect you're also seeing overlapping
searchers warning messages
in your log, and it;s _possible_ that what's happening is that
you're
just exceeding the
max warming searchers and never opening a new searcher with the
newly-indexed documents.
But that's a total shot in

Re: copy field from boolean to int

2015-03-18 Thread Kevin Osborn

I already use this field elsewhere, so I don't want to change it's type. I
did implement a UpdateRequestProcessor to copy from a bool to an int. This
works, but even better would be to fix Solr so that I can use DocValues
with boolean. So, I am going to try to get that working as well.

On Tue, Mar 17, 2015 at 10:25 PM, William Bell billnb...@gmail.com wrote:

 Can you reindex? Just use 1,0.

 On Tue, Mar 17, 2015 at 6:08 PM, Chris Hostetter hossman_luc...@fucit.org
 
 wrote:

 
  Can you open a jira to add docValues support for BoolField? ... i can't
  think of any good reason not to directly support that in Solr for
  BoolField ... seems like just an oversight that slipped through the
  cracks.
 
 
  For now, your best bet is probably to use an UpdateProcessor ... maybe 2
  instances of RegexReplaceProcessorFactory to match true and false and
  replace them with 0 and 1 ?
 
 
  : Date: Tue, 17 Mar 2015 17:57:03 -0700
  : From: Kevin Osborn kosb...@centraldesktop.com
  : Reply-To: solr-user@lucene.apache.org
  : To: solr-user@lucene.apache.org
  : Subject: copy field from boolean to int
  :
  : I was hoping to use DocValues, but one of my fields is a boolean, which
  is
  : not currently supported by DocValues. I can use a copyField to convert
 my
  : boolean to a string. Is there is anyway to use a copyField to convert
  from
  : a boolean to a tint?
 
 
  -Hoss
  http://www.lucidworks.com/
 



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076

RE: schema.xml xsd file

2015-03-18 Thread Pedro Figueiredo

:( ok, thank you.

Pedro Figueiredo
Senior Engineer

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 18 March 2015 15:28
To: solr-user@lucene.apache.org
Subject: Re: schema.xml xsd file

There isn't one. The question has ben bandied back and forth several times, but 
the reaction is that an XSD would be more trouble than it's worth, especially 
as it would have to handle any customizations that anyone wanted to throw at, 
say, custom field types.

Best,
Erick

On Wed, Mar 18, 2015 at 7:45 AM, Pedro Figueiredo  
pjlfigueir...@criticalsoftware.com wrote:

 Hello,

 Where can I find the xsd file for the schema.xml file?

 Thanks in advanced!

 Best regards,

 *Pedro Figueiredo*
 Senior Engineer

 pjlfigueir...@criticalsoftware.com
 M. 934058150

 [image: CRITICAL Software]

 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal 
 T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

 PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® 
 LEVEL 5 RATED COMPANY  http://cmmiinstitute.com/CMMI® is registered 
 in the USPTO by CMU http://www.cmu.edu/

Re: Which one is it cs or cz for Czech language?

2015-03-18 Thread Chris Hostetter


: Probably a historical artifact.

Yeah, probably.  fixing the solr example configs would be fairly trivial 
-- the names are just symbolic strings -- but currently they are all 
consistent with the lucene packagine names, which would me a more complex 
cange from a back compat standpoint -- i've opened some linked issues, 
hopefully someone who is more of an expert on the naming conventions of 
these packages can chime in and we can clean this up...

https://issues.apache.org/jira/browse/SOLR-7267
https://issues.apache.org/jira/browse/LUCENE-6366


-Hoss
http://www.lucidworks.com/

Unable to index rich-text documents in Solr Cloud

Hi everyone,

I'm having some issues with indexing rich-text documents from the Solr
Cloud. When I tried to index a pdf or word document, I get the following
error:


org.apache.solr.common.SolrException: Bad Request



request: 
http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)


I'm able to index .xml and .csv files in Solr Cloud with the same configuration.

I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and
I have 2 shards with the following details:
Shard1: 192.168.2.2:8983
Shard2: 192.168.2.2:8984

Prior to this, I'm already able to index rich-text documents without
the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
so my ExtractRequestHandler is already defined.

Is there other settings required in order to index rich-text documents
in Solr Cloud?


Regards,
Edwin

Re: Add replica on shards

Any help please...

On Wed, Mar 18, 2015 at 12:02 PM, Nitin Solanki nitinml...@gmail.com
wrote:

 Hi,
  I have created 8 shards on a collection named as ***wikingram**.
 Now at that time, I were not created any replica. Now, I want to add a
 replica on each shard. How can I do?
 I created this - ** sudo curl
 http://localhost:8983/solr/admin/collections?action=ADDREPLICAcollection=wikingramshard=shard1node=localhost:8983_solr**
 but it is not working.

 It throws errror -


 response
 lst name=responseHeader
 int name=status400/int
 int name=QTime86/int
 /lst
 str name=Operation ADDREPLICA caused
 exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 Could not find collection : null/str
 lst name=exception
 str name=msgCould not find collection : null/str
 int name=rspCode400/int
 /lst
 lst name=error
 str name=msgCould not find collection : null/str
 int name=code400/int
 /lst
 /response

 Any help on this?

RE: Which one is it cs or cz for Czech language?

2015-03-18 Thread steve

FYI:http://www.w3schools.com/tags/ref_country_codes.aspCZECH REPUBLICCZNo entry 
for CS
 From: md...@apache.org
 Date: Tue, 17 Mar 2015 12:45:57 -0500
 Subject: Re: Which one is it cs or cz for Czech language?
 To: solr-user@lucene.apache.org

 Probably a historical artifact.

 cz is the country code for the Czech Republic, cs is the language code for
 Czech. Once, cs was also the country code for Czechosolvakia, leading some
 folks to accidentally conflate the two.

 On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru enygma2...@gmail.com
 wrote:

  Hi,

  First of all, a bit of a disclaimer: I am not a Czech language speaker, at
  all.

  We are using Solr's dynamic fields in our project (XWiki), and we have
  recently noticed a problem [1] with the Czech language.

  Basically, our mapping says something like this:

  dynamicField name=*_cz type=text_cz indexed=true stored=true
  multiValued=true /

  ...but at runtime, we ask for the language code cs (which is the ISO
  language code for Czech [2]) and it obviously fails (due to the mapping).

  Now, we can easily fix this on our end by fixing the mapping to
  name=*_cs,
  but what we are really wondering now is why does Lucene/Solr use cz
  (country code) instead of cs (language code) in both its text_cz field
  and its stopwords_cz.txt file?

  Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
  Is it going to be fixed?

  Thanks,
  Eduard

  --
  [1] http://jira.xwiki.org/browse/XWIKI-11897
  [2] http://en.wikipedia.org/wiki/Czech_language

Re: schema.xml xsd file

On 3/18/2015 8:45 AM, Pedro Figueiredo wrote:
 Where can I find the xsd file for the schema.xml file?

As Erick said, current XSD files do not exist.

There are some (now probably outdated) XSD files in a patch on this issue:

https://issues.apache.org/jira/browse/SOLR-1758

Thanks,
Shawn

RE: Distributed IDF performance

2015-03-18 Thread Markus Jelsma

Anshum, Jack - don't any of you have a cluster at hand to get some real results 
on this? After testing the actual functionality for a quite some time while the 
final patch was in development, we have not had the change to work on 
performance tests. We are still on Solr 4.10 and have to port lots of Lucene 
stuff to 5. I would sure like to see some numbers from any of you :)

Markus
 
 
-Original message-
 From:Anshum Gupta ans...@anshumgupta.net
 Sent: Friday 13th March 2015 23:33
 To: solr-user@lucene.apache.org
 Subject: Re: Distributed IDF performance
 
 np!
 
 I forgot to mention that I didn't notice any considerable performance hit
 in my tests. The QTimes were barely off by 5%.
 
 On Fri, Mar 13, 2015 at 3:13 PM, Jack Krupansky jack.krupan...@gmail.com
 wrote:
 
  Oops... I said StatsInfo and that should have been StatsCache
  (statsCache .../).
 
  -- Jack Krupansky
 
  On Fri, Mar 13, 2015 at 6:04 PM, Anshum Gupta ans...@anshumgupta.net
  wrote:
 
   There's no rough formula or performance data that I know of at this
  point.
   About he guidance, if you want to use Global stats, my obvious choice
  would
   be to use the LRUStatsCache.
   Before committing, I did run some tests on my macbook but as I said back
   then, they shouldn't be totally taken at face value. The tests didn't
   involve any network and were just about 20mn docs and synthetic queries.
  
   On Fri, Mar 13, 2015 at 2:08 PM, Jack Krupansky 
  jack.krupan...@gmail.com
   wrote:
  
Does anybody have any actual performance data or even a rough formula
  for
calculating the overhead for using the new Solr 5.0 Distributed IDF (
SOLR-1632 https://issues.apache.org/jira/browse/SOLR-1632)?
   
And any guidance as far as which StatsInfo plugin is best to use?
   
Are many people now using Distributed IDF as their default?
   
I'm not currently using this, but the existing doc and Jira is too
   minimal
to offer guidance as requested above. Mostly I'm just curious.
   
Thanks.
   
-- Jack Krupansky
   
  
  
  
   --
   Anshum Gupta
  
 
 
 
 
 -- 
 Anshum Gupta

Re: High memory usage while querying with sort using cursor

2015-03-18 Thread Vaibhav Bhandari

Thanks Chris, that makes a lot of sense.



On Wed, Mar 18, 2015 at 3:16 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : A simple query on the collection: ../select?q=*:* works perfectly fine.
 :
 : But as soon as i add sorting, it crashes the nodes with OOM:
 : .../select?q=*:*sort=unique_id ascrows=0.

 if you don't have docValues=true on your unique_id field, then sorting
 rquires it to build up a large in memory data strucutre (formally known as
 FieldCache, now just an on the fly DocValues structure)

 With explicit docValues constructed at index time, a lot of that data can
 just live in the operating system's filesystem cache, and lucene only has
 to load a small potion of it into the heap.



 -Hoss
 http://www.lucidworks.com/

Re: CloudSolrServer : Could not find collection : gettingstarted

Does the Solr admin UIcloud view show the gettingstarted collection?
The graph view might help. It _sounds_ like somehow you didn't
actually create the collection.

What steps did you follow to create the collection in SolrCloud? It's
possible you have the wrong ZK root somehow I suppose.

Best,
Erick

On Wed, Mar 18, 2015 at 12:32 PM, Adnan Yaqoob itsad...@gmail.com wrote:
 I'm getting following exception while trying to upload document on
 SolrCloud using CloudSolrServer.

 Exception in thread main org.apache.solr.common.SolrException:
 *Could not find collection :* gettingstarted
 at 
 org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)
 at 
 org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305)
 at 
 org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
 at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
 at Test.addDocumentSolrCloud(Test.java:265)
 at Test.main(Test.java:284)

 I can query through Solr admin, able to upload document using
 HttpSolrServer (single instance - non cloud mode) but CloudSolrServer. I've
 also verified the collection exists on zookeeper using zkCli command.

 Following is the code snippet

 CloudSolrServer server = new CloudSolrServer(localhost:2181);
 server.setDefaultCollection(gettingstarted);
 SolrInputDocument doc = new SolrInputDocument();
 doc.addField(id, id);
 doc.addField(name, name);

 server.add(doc);

 server.commit();

 Not sure what I'm missing. My Zookeeper is running externally with two solr
 nodes on same mac

 --
 Regards,
 *Adnan Yaqoob*

Re: Unable to index rich-text documents in Solr Cloud

Hi Erick,

No, the PDF file is a testing file which only contains 1 sentence.

I've managed to get it to work by removing startup=lazy in
the ExtractingRequestHandler and added the following lines:
  str name=uprefixignored_/str
  str name=captureAttrtrue/str
  str name=fmap.alinks/str
  str name=fmap.divignored_/str

Does the presence of startup=lazy affect the function of
ExtractingRequestHandler , or is it one of the str name values?

Regards,
Edwin


On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote:

 Shot in the dark, but is the PDF file significantly larger than the
 others? Perhaps your simply exceeding the packet limits for the
 servlet container?

 Best,
 Erick

 On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Hi everyone,
 
  I'm having some issues with indexing rich-text documents from the Solr
  Cloud. When I tried to index a pdf or word document, I get the following
  error:
 
 
  org.apache.solr.common.SolrException: Bad Request
 
 
 
  request:
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
  at
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
  at java.lang.Thread.run(Unknown Source)
 
 
  I'm able to index .xml and .csv files in Solr Cloud with the same
 configuration.
 
  I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and
  I have 2 shards with the following details:
  Shard1: 192.168.2.2:8983
  Shard2: 192.168.2.2:8984
 
  Prior to this, I'm already able to index rich-text documents without
  the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
  so my ExtractRequestHandler is already defined.
 
  Is there other settings required in order to index rich-text documents
  in Solr Cloud?
 
 
  Regards,
  Edwin

Re: Unable to index rich-text documents in Solr Cloud

This is the logs that I got from solr.log. I can't seems to figure out
what's wrong with it. Does anyone knows?



ERROR - 2015-03-18 15:06:51.019;
org.apache.solr.update.StreamingSolrClients$1; error
org.apache.solr.common.SolrException: Bad Request



request:
http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
INFO  - 2015-03-18 15:06:51.019;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update/extract params={literal.id
=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdfresource.name=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf}
{add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252
INFO  - 2015-03-18 15:06:51.029;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2015-03-18 15:06:51.029;
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
Skipping IW.commit.
INFO  - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore;
SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update params={waitSearcher=truedistrib.from=
http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false}
{commit=} 0 10
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update params={commit=true} {commit=} 0 10



Regards,
Edwin



On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote:

 I suggest you check your solr logs for more info as to the cause.

 On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:

  Hi Erick,
 
  No, the PDF file is a testing file which only contains 1 sentence.
 
  I've managed to get it to work by removing startup=lazy in
  the ExtractingRequestHandler and added the following lines:
str name=uprefixignored_/str
str name=captureAttrtrue/str
str name=fmap.alinks/str
str name=fmap.divignored_/str
 
  Does the presence of startup=lazy affect the function of
  ExtractingRequestHandler , or is it one of the str name values?
 
  Regards,
  Edwin
 
 
  On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com
 wrote:
 
   Shot in the dark, but is the PDF file significantly larger than the
   others? Perhaps your simply exceeding the packet limits for the
   servlet container?
  
   Best,
   Erick
  
   On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Hi everyone,
   
I'm having some issues with indexing rich-text documents from the
 Solr
Cloud. When I tried to index a pdf or word document, I get the
  following
error:
   
   
org.apache.solr.common.SolrException: Bad Request
   
   
   
request:
  
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
at
  
 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
   Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
   Source)
at java.lang.Thread.run(Unknown Source)
   
   
I'm able to index .xml and .csv files in Solr Cloud with the same
   configuration.
   
I have setup Solr Cloud using the default zookeeper in Solr 5.0.0,
 and
I have 2 shards with the following details:
Shard1: 192.168.2.2:8983
Shard2: 192.168.2.2:8984
   
Prior to this, I'm already able to index rich-text documents without
the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
so my ExtractRequestHandler is already defined.
   
Is there other settings required in order to index rich-text
 documents
in Solr Cloud?
   
   
Regards,
Edwin
  
 



 --
 Damien Kamerman

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari

Alex thanks for replying
my solrconfig :

lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar / 
lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar /


##
requestHandler name=/dataimport class=
org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults
 str name=configdata-config-new.xml/str /lst /requestHandler


On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

  Could not load driver: com.mysql.jdbc.Driver

 Looks like a custom driver. Is the driver name correct? Is the library
 declared in solrconfig.xml? Is the library path correct (use absolute
 path if in doubt).

 Regards,
Alex.

 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com wrote:
  Please provide the basic steps to resolve the issue
 
 
  Getting following error
 
  Full Import failed:java.lang.RuntimeException:
  java.lang.RuntimeException:
  org.apache.solr.handler.dataimport.DataImportHandlerException: Could
  not load driver: com.mysql.jdbc.Driver Processing Document # 1

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Damien Kamerman

I suggest you check your solr logs for more info as to the cause.

On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Hi Erick,

 No, the PDF file is a testing file which only contains 1 sentence.

 I've managed to get it to work by removing startup=lazy in
 the ExtractingRequestHandler and added the following lines:
   str name=uprefixignored_/str
   str name=captureAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored_/str

 Does the presence of startup=lazy affect the function of
 ExtractingRequestHandler , or is it one of the str name values?

 Regards,
 Edwin


 On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote:

  Shot in the dark, but is the PDF file significantly larger than the
  others? Perhaps your simply exceeding the packet limits for the
  servlet container?
 
  Best,
  Erick
 
  On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Hi everyone,
  
   I'm having some issues with indexing rich-text documents from the Solr
   Cloud. When I tried to index a pdf or word document, I get the
 following
   error:
  
  
   org.apache.solr.common.SolrException: Bad Request
  
  
  
   request:
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
   at
 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
  Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
  Source)
   at java.lang.Thread.run(Unknown Source)
  
  
   I'm able to index .xml and .csv files in Solr Cloud with the same
  configuration.
  
   I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and
   I have 2 shards with the following details:
   Shard1: 192.168.2.2:8983
   Shard2: 192.168.2.2:8984
  
   Prior to this, I'm already able to index rich-text documents without
   the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
   so my ExtractRequestHandler is already defined.
  
   Is there other settings required in order to index rich-text documents
   in Solr Cloud?
  
  
   Regards,
   Edwin
 




-- 
Damien Kamerman

not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari

Please provide the basic steps to resolve the issue


Getting following error

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could
not load driver: com.mysql.jdbc.Driver Processing Document # 1

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Charlee Chitsuk

The http://192.168.2.2:8984/solr/
http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
,
the port number 8984 may be an HTTPS. The HTTP port should be 8983.

Hope this help.

--
   Best Regards,

   Charlee Chitsuk

===
Application Security Product Group
*Summit Computer Co., Ltd.* http://www.summitthai.com/
E-Mail: char...@summitthai.com
Tel: +66-2-238-0895 to 9 ext. 164
Fax: +66-2-236-7392
===
*@ Your Success is Our Pride*
--

2015-03-19 11:49 GMT+07:00 Damien Kamerman dami...@gmail.com:

 It sounds like https://issues.apache.org/jira/browse/SOLR-5551
 Have you checked the solr.log for all nodes?

 On 19 March 2015 at 14:43, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:

  This is the logs that I got from solr.log. I can't seems to figure out
  what's wrong with it. Does anyone knows?
 
 
 
  ERROR - 2015-03-18 15:06:51.019;
  org.apache.solr.update.StreamingSolrClients$1; error
  org.apache.solr.common.SolrException: Bad Request
 
 
 
  request:
 
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
  
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
  
  at
 
 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
  INFO  - 2015-03-18 15:06:51.019;
  org.apache.solr.update.processor.LogUpdateProcessor; [logmill]
 webapp=/solr
  path=/update/extract params={literal.id
  =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf
 resource.name
  =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf}
  {add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0
 1252
  INFO  - 2015-03-18 15:06:51.029;
  org.apache.solr.update.DirectUpdateHandler2; start
 
 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  INFO  - 2015-03-18 15:06:51.029;
  org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
  Skipping IW.commit.
  INFO  - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore;
  SolrIndexSearcher has not changed - not re-opening:
  org.apache.solr.search.SolrIndexSearcher
  INFO  - 2015-03-18 15:06:51.039;
  org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
  INFO  - 2015-03-18 15:06:51.039;
  org.apache.solr.update.processor.LogUpdateProcessor; [logmill]
 webapp=/solr
  path=/update params={waitSearcher=truedistrib.from=
 
 
 http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false
  }
  {commit=} 0 10
  INFO  - 2015-03-18 15:06:51.039;
  org.apache.solr.update.processor.LogUpdateProcessor; [logmill]
 webapp=/solr
  path=/update params={commit=true} {commit=} 0 10
 
 
 
  Regards,
  Edwin
 
 
  On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote:
 
   I suggest you check your solr logs for more info as to the cause.
  
   On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com
   wrote:
  
Hi Erick,
   
No, the PDF file is a testing file which only contains 1 sentence.
   
I've managed to get it to work by removing startup=lazy in
the ExtractingRequestHandler and added the following lines:
  str name=uprefixignored_/str
  str name=captureAttrtrue/str
  str name=fmap.alinks/str
  str name=fmap.divignored_/str
   
Does the presence of startup=lazy affect the function of
ExtractingRequestHandler , or is it one of the str name values?
   
Regards,
Edwin
   
   
On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com
   wrote:
   
 Shot in the dark, but is the PDF file significantly larger than the
 others? Perhaps your simply exceeding the packet limits for the
 servlet container?

 Best,
 Erick

 On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Hi everyone,
 
  I'm having some issues with indexing rich-text documents from the
   Solr
  Cloud. When I tried to index a pdf or word document, I get the
following
  error:
 
 
  org.apache.solr.common.SolrException: Bad Request
 
 
 
  request:

   
  
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
  at

Re: Unable to index rich-text documents in Solr Cloud

On 3/18/2015 1:22 AM, Zheng Lin Edwin Yeo wrote:
 I'm having some issues with indexing rich-text documents from the Solr
 Cloud. When I tried to index a pdf or word document, I get the following
 error:
 
 
 org.apache.solr.common.SolrException: Bad Request
 
 
 
 request: 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2

This request appears to be one of the requests that SolrCloud makes
between its different nodes, but it is using the /update handler.  I
assume that when you sent the request, you sent it to the
/update/extract handler because it's a rich text document?  The /update
handler can't do rich text documents, it's only for documents in json,
xml, csv, javabin, etc that are formatted in specific ways.

One thing I'm wondering is whether the Extracting handler requires a
shards.qt parameter, also set to /update/extract, to work right with
SolrCloud.  I have never used that handler myself, so I've got no idea
what is required to make it work right.

Thanks,
Shawn

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari

lib dir=/home/shopclues/solr-4.2.1/example/lib/
regex=mysql-connector-java-5.1.22-bin.jar /
  lib dir=/home/shopclues/solr-4.2.1/dist/
regex=solr-dataimporthandler-.*\.jar /

but still not working

On Thu, Mar 19, 2015 at 10:41 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Try absolute path to the jar directory. Hard to tell whether relative
 path is correct without knowing exactly how you are running it.

 Regards,
 Alex.
 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 19 March 2015 at 01:00, abhishek tiwari test.mi...@gmail.com wrote:
  Alex thanks for replying
  my solrconfig :
 
  lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar /
 
  lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar /
 
 
  ##
  requestHandler name=/dataimport class=
  org.apache.solr.handler.dataimport.DataImportHandler lst
 name=defaults
  str name=configdata-config-new.xml/str /lst /requestHandler
 
 
  On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
   Could not load driver: com.mysql.jdbc.Driver
 
  Looks like a custom driver. Is the driver name correct? Is the library
  declared in solrconfig.xml? Is the library path correct (use absolute
  path if in doubt).
 
  Regards,
 Alex.
 
  
  Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
  http://www.solr-start.com/
 
 
  On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com
 wrote:
   Please provide the basic steps to resolve the issue
  
  
   Getting following error
  
   Full Import failed:java.lang.RuntimeException:
   java.lang.RuntimeException:
   org.apache.solr.handler.dataimport.DataImportHandlerException: Could
   not load driver: com.mysql.jdbc.Driver Processing Document # 1

Re: Unable to index rich-text documents in Solr Cloud

This is the logs that I got from solr.log. I can't seems to figure out
what's wrong with it. Does anyone knows?



ERROR - 2015-03-18 15:06:51.019;
org.apache.solr.update.StreamingSolrClients$1; error
org.apache.solr.common.SolrException: Bad Request



request:
http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
INFO  - 2015-03-18 15:06:51.019;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update/extract params={literal.id
=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdfresource.name=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf}
{add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252
INFO  - 2015-03-18 15:06:51.029;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2015-03-18 15:06:51.029;
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
Skipping IW.commit.
INFO  - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore;
SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update params={waitSearcher=truedistrib.from=
http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false}
{commit=} 0 10
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update params={commit=true} {commit=} 0 10



Regards,
Edwin


On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote:

 I suggest you check your solr logs for more info as to the cause.

 On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:

  Hi Erick,
 
  No, the PDF file is a testing file which only contains 1 sentence.
 
  I've managed to get it to work by removing startup=lazy in
  the ExtractingRequestHandler and added the following lines:
str name=uprefixignored_/str
str name=captureAttrtrue/str
str name=fmap.alinks/str
str name=fmap.divignored_/str
 
  Does the presence of startup=lazy affect the function of
  ExtractingRequestHandler , or is it one of the str name values?
 
  Regards,
  Edwin
 
 
  On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com
 wrote:
 
   Shot in the dark, but is the PDF file significantly larger than the
   others? Perhaps your simply exceeding the packet limits for the
   servlet container?
  
   Best,
   Erick
  
   On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Hi everyone,
   
I'm having some issues with indexing rich-text documents from the
 Solr
Cloud. When I tried to index a pdf or word document, I get the
  following
error:
   
   
org.apache.solr.common.SolrException: Bad Request
   
   
   
request:
  
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
at
  
 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
   Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
   Source)
at java.lang.Thread.run(Unknown Source)
   
   
I'm able to index .xml and .csv files in Solr Cloud with the same
   configuration.
   
I have setup Solr Cloud using the default zookeeper in Solr 5.0.0,
 and
I have 2 shards with the following details:
Shard1: 192.168.2.2:8983
Shard2: 192.168.2.2:8984
   
Prior to this, I'm already able to index rich-text documents without
the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
so my ExtractRequestHandler is already defined.
   
Is there other settings required in order to index rich-text
 documents
in Solr Cloud?
   
   
Regards,
Edwin
  
 



 --
 Damien Kamerman

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Damien Kamerman

It sounds like https://issues.apache.org/jira/browse/SOLR-5551
Have you checked the solr.log for all nodes?

On 19 March 2015 at 14:43, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 This is the logs that I got from solr.log. I can't seems to figure out
 what's wrong with it. Does anyone knows?



 ERROR - 2015-03-18 15:06:51.019;
 org.apache.solr.update.StreamingSolrClients$1; error
 org.apache.solr.common.SolrException: Bad Request



 request:

 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
 
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 INFO  - 2015-03-18 15:06:51.019;
 org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
 path=/update/extract params={literal.id
 =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdfresource.name
 =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf}
 {add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252
 INFO  - 2015-03-18 15:06:51.029;
 org.apache.solr.update.DirectUpdateHandler2; start

 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 INFO  - 2015-03-18 15:06:51.029;
 org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
 Skipping IW.commit.
 INFO  - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore;
 SolrIndexSearcher has not changed - not re-opening:
 org.apache.solr.search.SolrIndexSearcher
 INFO  - 2015-03-18 15:06:51.039;
 org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
 INFO  - 2015-03-18 15:06:51.039;
 org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
 path=/update params={waitSearcher=truedistrib.from=

 http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false
 }
 {commit=} 0 10
 INFO  - 2015-03-18 15:06:51.039;
 org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
 path=/update params={commit=true} {commit=} 0 10



 Regards,
 Edwin


 On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote:

  I suggest you check your solr logs for more info as to the cause.
 
  On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com
  wrote:
 
   Hi Erick,
  
   No, the PDF file is a testing file which only contains 1 sentence.
  
   I've managed to get it to work by removing startup=lazy in
   the ExtractingRequestHandler and added the following lines:
 str name=uprefixignored_/str
 str name=captureAttrtrue/str
 str name=fmap.alinks/str
 str name=fmap.divignored_/str
  
   Does the presence of startup=lazy affect the function of
   ExtractingRequestHandler , or is it one of the str name values?
  
   Regards,
   Edwin
  
  
   On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com
  wrote:
  
Shot in the dark, but is the PDF file significantly larger than the
others? Perhaps your simply exceeding the packet limits for the
servlet container?
   
Best,
Erick
   
On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 Hi everyone,

 I'm having some issues with indexing rich-text documents from the
  Solr
 Cloud. When I tried to index a pdf or word document, I get the
   following
 error:


 org.apache.solr.common.SolrException: Bad Request



 request:
   
  
 
 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2
 at
   
  
 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
 at java.lang.Thread.run(Unknown Source)


 I'm able to index .xml and .csv files in Solr Cloud with the same
configuration.

 I have setup Solr Cloud using the default zookeeper in Solr 5.0.0,
  and
 I have 2 shards with the following details:
 Shard1: 192.168.2.2:8983
 Shard2: 192.168.2.2:8984

 Prior to this, I'm already able to index rich-text documents
 without
 the Solr Cloud, and I'm using the same solrconfig.xml and
 schema.xml,
 so my ExtractRequestHandler is already defined.

 Is there other

Re: not able to import Data through DIH solr 4.2.1

On 3/18/2015 11:00 PM, abhishek tiwari wrote:
 my solrconfig :
 
 lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar / 
 lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar /

The way that I always recommend dealing with extra jars:

In your solr home, create a lib directory.  Copy all the extra jars
that you need into this directory, including the DIH jar and your jdbc
driver jar.  Remove all lib config elements from solrconfig.xml.

In Solr 4.2, you will also need to make sure that your solr.xml has a
sharedLib attribute on the solr tag, set to lib.  On 4.3 and later,
this step is not required ... it will actually cause the jars to NOT work.

See the comment on this issue dated 12/Nov/13:

https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13820197page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820197

Thanks,
Shawn

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread Alexandre Rafalovitch

 Could not load driver: com.mysql.jdbc.Driver

Looks like a custom driver. Is the driver name correct? Is the library
declared in solrconfig.xml? Is the library path correct (use absolute
path if in doubt).

Regards,
   Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com wrote:
 Please provide the basic steps to resolve the issue


 Getting following error

 Full Import failed:java.lang.RuntimeException:
 java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Could
 not load driver: com.mysql.jdbc.Driver Processing Document # 1

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread Alexandre Rafalovitch

Try absolute path to the jar directory. Hard to tell whether relative
path is correct without knowing exactly how you are running it.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 19 March 2015 at 01:00, abhishek tiwari test.mi...@gmail.com wrote:
 Alex thanks for replying
 my solrconfig :

 lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar / 
 lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar /


 ##
 requestHandler name=/dataimport class=
 org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults
 str name=configdata-config-new.xml/str /lst /requestHandler


 On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

  Could not load driver: com.mysql.jdbc.Driver

 Looks like a custom driver. Is the driver name correct? Is the library
 declared in solrconfig.xml? Is the library path correct (use absolute
 path if in doubt).

 Regards,
Alex.

 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com wrote:
  Please provide the basic steps to resolve the issue
 
 
  Getting following error
 
  Full Import failed:java.lang.RuntimeException:
  java.lang.RuntimeException:
  org.apache.solr.handler.dataimport.DataImportHandlerException: Could
  not load driver: com.mysql.jdbc.Driver Processing Document # 1

Can we used CloudSolrServer for searching data

2015-03-18 Thread Adnan Yaqoob

I am using Solrcloud with zookeeper setup. but when I try to make query using 
following code snippet I get exception

code:
CloudSolrServer server = new CloudSolrServer(localhost:2181);
   server.setDefaultCollection(gettingstarted);
   server.connect();
   SolrQuery query = new SolrQuery();
   query.setQuery(q);
   QueryResponse rsp;
   rsp = server.query(query);

Exception:
Exception in thread main org.apache.solr.common.SolrException: Collection not 
found: gettingstarted
 at 
org.apache.solr.client.solrj.impl.CloudSolrServer.getCollectionList(CloudSolrServer.java:679)
 at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:562)
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
 at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at Test.testSelectQueryCloudServer(Test.java:243)
 at Test.main(Test.java:357)

I’ve verified the collection exists on zookeeper using zkCli and I can query 
using solr admin






Sent from Windows Mail

CloudSolrServer : Could not find collection : gettingstarted

2015-03-18 Thread Adnan Yaqoob

I'm getting following exception while trying to upload document on
SolrCloud using CloudSolrServer.

Exception in thread main org.apache.solr.common.SolrException:
*Could not find collection :* gettingstarted
at 
org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at Test.addDocumentSolrCloud(Test.java:265)
at Test.main(Test.java:284)

I can query through Solr admin, able to upload document using
HttpSolrServer (single instance - non cloud mode) but CloudSolrServer. I've
also verified the collection exists on zookeeper using zkCli command.

Following is the code snippet

CloudSolrServer server = new CloudSolrServer(localhost:2181);
server.setDefaultCollection(gettingstarted);
SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, id);
doc.addField(name, name);

server.add(doc);

server.commit();

Not sure what I'm missing. My Zookeeper is running externally with two solr
nodes on same mac

-- 
Regards,
*Adnan Yaqoob*

Re: High memory usage while querying with sort using cursor

2015-03-18 Thread Chris Hostetter


: A simple query on the collection: ../select?q=*:* works perfectly fine.
: 
: But as soon as i add sorting, it crashes the nodes with OOM:
: .../select?q=*:*sort=unique_id ascrows=0.

if you don't have docValues=true on your unique_id field, then sorting 
rquires it to build up a large in memory data strucutre (formally known as 
FieldCache, now just an on the fly DocValues structure)

With explicit docValues constructed at index time, a lot of that data can 
just live in the operating system's filesystem cache, and lucene only has 
to load a small potion of it into the heap.



-Hoss
http://www.lucidworks.com/

Re: Whole RAM consumed while Indexing.

bq: As you said, do commits after 6 seconds

No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_
as Shawn said. So setting it to 6 is every minute.

From solrconfig.xml, conveniently located immediately above the
autoCommit tag:

maxTime - Maximum amount of time in ms that is allowed to pass since a
document was added before automatically triggering a new commit.

Also, a lot of answers to soft and hard commits is here as I pointed
out before, did you read it?

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best
Erick

On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:
Probably merged somewhat differently with some terms indexes repeating
between segments. Check the number of segments in data directory.And
do search for *:* and make sure both do have the same document counts.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

When I changed the configuration to 6 for soft commit and 6 for
hard commit and indexed same data then I got the data size of the whole
index to be 5GB after completing the indexing.

But the number of documents in the both scenario were same. I am wondering
how that can be possible?

On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote:

On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com
wrote:

Don't do it. Really, why do you want to do this? This seems like
an XY problem, you haven't explained why you need to commit
things so very quickly.

I suspect you haven't tried _searching_ while committing at such
a rate, and you might as well turn all your top-level caches off
in solrconfig.xml since they won't be useful at all.

Best,
Erick

On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi,
If I do very very fast indexing(softcommit = 300 and hardcommit =
3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as
you
both said. Will fast indexing fail to index some data?
Any suggestion on this ?

On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:

Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com
wrote:

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki
nitinml...@gmail.com
wrote:
Hi Erick,
You are saying correct. Something, **overlapping
searchers
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the
time
of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very,
very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But the short form is that you're opening searchers every 300 ms.
The
hard commit is better,
but every 3 seconds is still far too short IMO. I'd start with
soft
commits of 6 and hard
commits of 6 (60

High memory usage while querying with sort using cursor

2015-03-18 Thread Vaibhav Bhandari

Hi all,

My setup is as follows:

*Collection* size: 32GB, 2 shards, replication factor: 2 (~16GB on each
replica). Number of rows: 250million
4 *Solr* nodes: RAM: 30GB each. Heap size: 8GB. Version: 4.9.1

Besides the collection in question, the nodes have some other collections
present. The total size of all collections of each node is 30GB (which is
the same as the amount of RAM on them).

A simple query on the collection: ../select?q=*:* works perfectly fine.

But as soon as i add sorting, it crashes the nodes with OOM:
.../select?q=*:*sort=unique_id ascrows=0.

I have tried to disable filter-cache and query-result-cache. But that did
not help either.

Any ideas/suggestions?

Thanks,
Vaibhav

Re: SolrCloud: data is present on one shard only

2015-03-18 Thread Aman Tandon

please help..

With Regards
Aman Tandon

On Tue, Mar 17, 2015 at 3:24 PM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi,

 I indexed the data in my SolrCoud architecture (2 shards present on 2
 separate instance  on one instance I have the replica of both the shards
 which is present on other 2 instance).

 And when I am looking at the index via admin interface, it is present on a
 single instance.  Isn't the data should be present on both the shards.

 Am I doing something wrong?

 With Regards
 Aman Tandon

Re: Solr returns incorrect results after sorting

2015-03-18 Thread kumarraj

Hi Jim,

Yes, you are right.. that document is having price 499.99,
But i want to consider the first record in the group as part of the main
sort.
Even though if i add price asc in the group.sort, but still the main sort
does not consider that.
group.sort=max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0))
desc,inStock_boolean descgeodist() asc,pricecommon_double
ascsort=pricecommon_double desc

Is there any other workaround so that sort is always based on the first
record which is pulled up in each group?


Regards,
Raj



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4193658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Add replica on shards