date:20160809

Re: Getting dynamic fields using LukeRequest.

2016-08-09 Thread Pranaya Behera


Hi Steve,
  I did look at the schema api but it only gives the 
defined dynamic fields not the indexed dynamic fields. For indexed 
fields with the rule of the defined dynamic field I guess LukeRequest is 
the only option. (Please correct me if I am wrong.)


Hence I am unable to fetch each and every indexed field with the defined 
dynamic field.


On 09/08/16 19:26, Steve Rowe wrote:

Not sure what the issue is with LukeRequest, but Solrj has Schema API support: 


You can see which options are supported here: 


--
Steve
www.lucidworks.com


On Aug 9, 2016, at 8:52 AM, Pranaya Behera  wrote:

Hi,
 I have the following script to retrieve all the fields in the collection. 
I am using SolrCloud 6.1.0.
LukeRequest lukeRequest = new LukeRequest();
lukeRequest.setNumTerms(0);
lukeRequest.setShowSchema(false);
LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
Map fieldInfoMap = lukeResponse.getFieldInfo();
for (Map.Entry entry : fieldInfoMap.entrySet()) 
{
  entry.getKey(); // Here fieldInfoMap is size of 0 for sometime and sometime 
it is getting incomplete data.
}


Setting showSchema to true doesn't yield any result. Only making it false 
yields result that too incomplete data. As I can see in the doc that it has 
more than what it is saying it has.

LukeRequest hits /solr/product/admin/luke?numTerms=0=javabin=2 
HTTP/1.1 .

How it should be configured for solrcloud ?
I have already mentioned



in the solrconfig.xml. It doesn't matter whether it is present in the 
solrconfig or not as I am requesting it from solrj.

Re: How to re-index SOLR data

2016-08-09 Thread Erick Erickson

Assuming you can re-index

Consider "collection aliasing". Say your current collection is C1.
Create C2 (using the same cluster, Zookeeper and the like). Go
ahead and index to C2 (however you do that). NOTE: the physical
machines may be _different_ than C1, or not. That's up to you. The
critical bit is that you use the same Zookeeper.

Now, when you are done you use the Collections API CREATEALIAS
command to point a "pseudo collection" to C1 (call it "prod"). This is
seamless to the users.

The flaw in my plan so far is that you probably go at Collection C1
directly. So what you might do is create the "prod" alias and point it at
C1. Now change your LB (or client or whatever) to use the "prod" collection,
then when indexing is complete use CREATEALIAS to point "prod" at C2
instead.

This is actually a quite well-tested process, often used when you want to
change "atomically", e.g. when you reindex the same data nightly but want
all the new data available in its entirety only after it has been QA'd or such.

Best,
Erick

On Tue, Aug 9, 2016 at 2:43 PM, John Bickerstaff
 wrote:
> In my case, I've done two things  neither of them involved taking the
> data from SOLR to SOLR...  although in my reading, I've seen that this is
> theoretically possible (I.E. sending data from one SOLR server to another
> SOLR server and  having the second SOLR instance re-index...)
>
> I haven't used the python script...  that was news to me, but it sounds
> interesting...
>
> What I've done is one of the following:
>
> a. Get the data from the original source (database, whatever) and massage
> it again so that i's ready for SOLR and then submit it to my new SolrCloud
> for indexing.
>
> b. Keep a separate store of EVERY Solr document as it comes out of my code
> (in xml) and store it in Kafka or a text file.  Then it's easy to push back
> into another SOLR instance any time - multiple times if necessary.
>
> I'm guessing you don't have the data stored away as in "b"...  And if you
> don't have a way of getting the data from some central source, then "a"
> won't work either...  Which leaves you with the concept of sending data
> from SOLR "A" to SOLR "B" and having "B" reindex...
>
> This might serve as a starting point in that case...
> https://wiki.apache.org/solr/HowToReindex
>
> You'll note that there are limitations and a strong caveat against doing
> this with SOLR, but if you have no other option, then it's the best you can
> do.
>
> Do you have the ability to get all the data again from an authoritative
> source?  (Relational Database or something similar?)
>
> On Tue, Aug 9, 2016 at 3:21 PM, Bharath Kumar 
> wrote:
>
>> Hi John,
>>
>> Thanks so much for your inputs. We have time to build another system. So
>> how did you index the same data on the main SOLR node to the new SOLR node?
>> Did you use the re-index python script? The new data will be indexed
>> correctly with the new rules, but what about the old data?
>>
>> Our SOLR data is around 30GB with around 60 million documents. We use SOLR
>> cloud with 3 solr nodes and 3 zookeepers.
>>
>> On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff > >
>> wrote:
>>
>> > In case this helps...
>> >
>> > Assuming you have the resources to build a copy of your production
>> > environment and assuming you have the time, you don't need to take your
>> > production down - or even affect it's processing...
>> >
>> > What I've done (with admittedly smaller data sets) is build a separate
>> > environment (usually on VM's) and once it's set up, I do the new indexing
>> > according to the new "rules"  (Like your change of long to string)
>> >
>> > Then, in a sense, I don't care how long it takes because it is not
>> > affecting Prod.
>> >
>> > When it's done, I simply switch my load balancer to point to the new
>> > environment and shut down the old one.
>> >
>> > To users, this could be seamless if you handle the load balancer
>> correctly
>> > and have it refuse new connections to the old servers while routing all
>> new
>> > connections to the new Solr servers...
>> >
>> > On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar > >
>> > wrote:
>> >
>> > > Hi Nick and Shawn,
>> > >
>> > > Thanks so much for the pointers. I will try that out. Thank you again!
>> > >
>> > > On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev <
>> nick.vasily...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi, I work on a python Solr Client
>> > > >  library and there is a
>> > > > reindexing helper module that you can use if you are on Solr 4.9+. I
>> > use
>> > > it
>> > > > all the time and I think it works pretty well. You can re-index all
>> > > > documents from a collection into another collection or dump them to
>> the
>> > > > filesystem as JSON. It also supports parallel execution and can run
>> > > > independently on each shard. There is also a way to

Re: commit it taking 1300 ms

2016-08-09 Thread Midas A

Thanks for replying

index size:9GB
2000 docs/sec.

Actually earlier it was taking less but suddenly it has increased .

Currently we do not have any monitoring  tool.

On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Midas,
>
> Can you give us more details on your index: size, number of new docs
> between commits. Why do you think 1.3s for commit is to much and why do you
> need it to take less? Did you do any system/Solr monitoring?
>
> Emir
>
>
> On 09.08.2016 14:10, Midas A wrote:
>
>> please reply it is urgent.
>>
>> On Tue, Aug 9, 2016 at 11:17 AM, Midas A  wrote:
>>
>> Hi ,
>>>
>>> commit is taking more than 1300 ms . what should i check on server.
>>>
>>> below is my configuration .
>>>
>>>  ${solr.autoCommit.maxTime:15000} <
>>> openSearcher>false  
>>> 
>>> ${solr.autoSoftCommit.maxTime:-1} 
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Re: The Query Elevation Component

2016-08-09 Thread Ryan Yacyshyn

Hi Alessandro,

My mistake, I thought for a second there that the elevation component
needed to actually searches through documents, which isn't the case.

Thanks,
Ryan





On Wed, 27 Jul 2016 at 19:15 Alessandro Benedetti 
wrote:

> Hi Ryan,
> can you explain this ?
> " I'd like the search request to search multiple
> fields, but only elevate if the query is found in one of the fields."
>
> You mean, that you want to apply the elevation component only if the user
> selected a particular field in the query ?
> If i remember well, you have the possibility of associate a list of
> documents to each query you prefer in the elevation file.
>
> But maybe I misunderstood your question, are you actually thinking to boost
> the results only if they have a certain match in a particular field ?
> Because maybe you are looking for the classic edismax with different field
> boosting instead than the query elevation component.
> Let us know and we can help you better!
>
> Cheers
>
> On Wed, Jul 27, 2016 at 4:49 AM, Ryan Yacyshyn 
> wrote:
>
> > Hi everyone,
> >
> > I'm reading the docs on the query elevation component and some questions
> > came up:
> >
> > Can I specify a field that the elevate component will look at, such as
> only
> > looking at the title field? My search handler (using eDisMax) is
> searching
> > across multiple fields, but if I only want the elevate component to look
> at
> > one field, is this possible? I'd like the search request to search
> multiple
> > fields, but only elevate if the query is found in one of the fields.
> >
> > Also, is there a recommended way to analyze the query? For example, when
> > using the queryFieldType parameter, I'd think I'd only want to use the
> > KeywordTokenizer and maybe lowercasing.
> >
> > Thanks,
> > Ryan
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-09 Thread Tim Chen

Guys, (@Erick & @Shawn),

Thanks for the great suggestions!

I have increased Tomcat MaxThreads from 200 to 1 on our staging 
environment. So far so good.

I will perform some more indexing test and see how it goes.

Many thanks,
Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, 8 August 2016 11:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

On 8/7/2016 6:53 PM, Tim Chen wrote:
> Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
> unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
> at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> at java.lang.Thread.run(Thread.java:745)

I find myself chasing Erick once again. :)  Supplementing what he told you:

There are two things that might be happening here.

1) The Tomcat setting "maxThreads" may limiting the number of threads.
This defaults to 200, and should be increased to 1.  The specific error 
doesn't sound like an application limit, though -- it acts more like Java 
itself can't create the thread.  If you have already adjusted maxThreads, then 
it's more likely to be the second option:

2) The operating system may be imposing a limit on the number of 
processes/threads a user is allowed to start.  On Linux systems, this is 
typically 1024.  For other operating systems, I am not sure what the default 
limit is.

Thanks,
Shawn

[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]

Re: Solr and Drupal

2016-08-09 Thread Alexandre Rafalovitch

I have this links from my - random - collection:

http://www.jeffgeerling.com/blog/2016/hosted-apache-solr-drupal-8-support
(has several pre-history articles cross-linked as well)

https://www.youtube.com/watch?v=2yDwbqPwW9M - Markus Kalkbrenner, Nick
Veenhof | The State of Search API Solr and Solr Multilingual 8.x
(Drupal + Solr)

https://www.youtube.com/watch?v=opqsl0OwFLk - Search API Multilingual
Solr Search 8.x Config Creation (Drupal)

Hope this helps,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 August 2016 at 05:06, Davis, Daniel (NIH/NLM) [C]
 wrote:
> John/Rose,
>
> With Drupal 7, the module John pointed to was the module to use.
> With Drupal 8, I have no idea.
>
> -Original Message-
> From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
> Sent: Tuesday, August 09, 2016 2:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr and Drupal
>
> Rose --
>
> Further reading on the drupal site suggests to me that the latest Drupal
> (8?) comes with a generic "connector" that can be tied to any search engine 
> and that the instructions on the page I sent may be superseded by the new 
> connector...
>
> I'm not familiar with Drupal beyond simple experimentation a few years ago, 
> but that's how I'd build it - make a connector and consume the returned data 
> (json, xml, whatever) and then turn it into Drupal-formatted html or 
> something similar.
>
> I think you might want to pursue the particulars on the Drupal list (I assume 
> one exists...)
>
> HTH
>
> On Tue, Aug 9, 2016 at 12:23 PM, Rose, John B  wrote:
>
>> Sameer, John
>>
>> Thanks
>>
>>
>> From: Sameer Maggon 
>> Reply-To: "solr-user@lucene.apache.org" 
>> Date: Tuesday, August 9, 2016 at 1:46 PM
>> To: "solr-user@lucene.apache.org" 
>> Subject: Re: Solr and Drupal
>>
>> Hi John,
>>
>> As John B. mentioned, you can utilize the plugin here -
>> https://www.drupal.org/project/apachesolr.> mailtrack.io/trace/link/5b49557fccf2653a8333a25cc6f15c
>> 245ccf7ec9?url=https%3A%2F%2Fwww.drupal.org%2Fproject%
>> 2Fapachesolr.=e242eddec6d9f0d9> If you are looking to not
>> have to worry about hosting, deployment, scaling and management, you
>> can take a look at SearchStax by Measured Search to get a Solr
>> deployment up and running in a couple of minutes and not have to get
>> into installing Solr and going through a learning curve around setup and 
>> scale.
>>
>>
>> Thanks,
>> Sameer.
>>
>>
>>
>> On Tue, Aug 9, 2016 at 12:11 PM, Rose, John B  jbr...@utk.edu>> wrote:
>> We are looking at Solr for a Drupal web site. We have never installed Solr.
>>
>>
>> From my readings it is not clear exactly what we need to implement a
>> search in Drupal with Solr. Some sites have implied Lucene and/or
>> Tomcat are needed.
>>
>>
>> Can someone point me to the site that explains minimally what is
>> needed to implement Solr within Drupal?
>>
>>
>> Thanks for your time
>>
>>
>>
>> --
>> Sameer Maggon
>> www.measuredsearch.com> 3404ae650cc88b51d518880f313638b7ca7d7f2c?url=http%3A%2F%
>> 2Fwww.measuredsearch.com=6436799da5f290d7>
>> 1.844.9.SEARCH
>> [cid:ii_iog5zjpp2_154cfcda7a913c0a]
>> Measured Search is the only Fully Managed Solr as a Service
>> multi-cloud capable offering.
>> Plus utilize our On Demand Expertise to build your applications faster
>> and with more confidence.
>>

Re: How to re-index SOLR data

2016-08-09 Thread John Bickerstaff

In my case, I've done two things  neither of them involved taking the
data from SOLR to SOLR...  although in my reading, I've seen that this is
theoretically possible (I.E. sending data from one SOLR server to another
SOLR server and  having the second SOLR instance re-index...)

I haven't used the python script...  that was news to me, but it sounds
interesting...

What I've done is one of the following:

a. Get the data from the original source (database, whatever) and massage
it again so that i's ready for SOLR and then submit it to my new SolrCloud
for indexing.

b. Keep a separate store of EVERY Solr document as it comes out of my code
(in xml) and store it in Kafka or a text file.  Then it's easy to push back
into another SOLR instance any time - multiple times if necessary.

I'm guessing you don't have the data stored away as in "b"...  And if you
don't have a way of getting the data from some central source, then "a"
won't work either...  Which leaves you with the concept of sending data
from SOLR "A" to SOLR "B" and having "B" reindex...

This might serve as a starting point in that case...
https://wiki.apache.org/solr/HowToReindex

You'll note that there are limitations and a strong caveat against doing
this with SOLR, but if you have no other option, then it's the best you can
do.

Do you have the ability to get all the data again from an authoritative
source?  (Relational Database or something similar?)

On Tue, Aug 9, 2016 at 3:21 PM, Bharath Kumar 
wrote:

> Hi John,
>
> Thanks so much for your inputs. We have time to build another system. So
> how did you index the same data on the main SOLR node to the new SOLR node?
> Did you use the re-index python script? The new data will be indexed
> correctly with the new rules, but what about the old data?
>
> Our SOLR data is around 30GB with around 60 million documents. We use SOLR
> cloud with 3 solr nodes and 3 zookeepers.
>
> On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff  >
> wrote:
>
> > In case this helps...
> >
> > Assuming you have the resources to build a copy of your production
> > environment and assuming you have the time, you don't need to take your
> > production down - or even affect it's processing...
> >
> > What I've done (with admittedly smaller data sets) is build a separate
> > environment (usually on VM's) and once it's set up, I do the new indexing
> > according to the new "rules"  (Like your change of long to string)
> >
> > Then, in a sense, I don't care how long it takes because it is not
> > affecting Prod.
> >
> > When it's done, I simply switch my load balancer to point to the new
> > environment and shut down the old one.
> >
> > To users, this could be seamless if you handle the load balancer
> correctly
> > and have it refuse new connections to the old servers while routing all
> new
> > connections to the new Solr servers...
> >
> > On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar  >
> > wrote:
> >
> > > Hi Nick and Shawn,
> > >
> > > Thanks so much for the pointers. I will try that out. Thank you again!
> > >
> > > On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev <
> nick.vasily...@gmail.com>
> > > wrote:
> > >
> > > > Hi, I work on a python Solr Client
> > > >  library and there is a
> > > > reindexing helper module that you can use if you are on Solr 4.9+. I
> > use
> > > it
> > > > all the time and I think it works pretty well. You can re-index all
> > > > documents from a collection into another collection or dump them to
> the
> > > > filesystem as JSON. It also supports parallel execution and can run
> > > > independently on each shard. There is also a way to resume if your
> job
> > > > craps out half way through if your existing schema is set up with a
> > good
> > > > date field and unique id.
> > > >
> > > > You can read the documentation here:
> > > > http://solrclient.readthedocs.io/en/latest/Reindexer.html
> > > >
> > > > Code is pretty short and is here:
> > > > https://github.com/moonlitesolutions/SolrClient/
> > blob/master/SolrClient/
> > > > helpers/reindexer.py
> > > >
> > > > Here is sample:
> > > > from SolrClient import SolrClient
> > > > from SolrClient.helpers import Reindexer
> > > >
> > > > r = Reindexer(SolrClient('http://source_solr:8983/solr'),
> SolrClient('
> > > > http://destination_solr:8983/solr') , source_coll='source_
> collection',
> > > > dest_coll='destination-collection')
> > > > r.reindex()
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey 
> > > wrote:
> > > >
> > > > > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > > > > What would be the best way to re-index the data in the SOLR
> cloud?
> > We
> > > > > > have around 65 million data and we are planning to change the
> > schema
> > > > > > by changing the unique key type from long to string. How long
> does
> > it
> > > >

Re: How to re-index SOLR data

2016-08-09 Thread Bharath Kumar

Hi John,

Thanks so much for your inputs. We have time to build another system. So
how did you index the same data on the main SOLR node to the new SOLR node?
Did you use the re-index python script? The new data will be indexed
correctly with the new rules, but what about the old data?

Our SOLR data is around 30GB with around 60 million documents. We use SOLR
cloud with 3 solr nodes and 3 zookeepers.

On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff 
wrote:

> In case this helps...
>
> Assuming you have the resources to build a copy of your production
> environment and assuming you have the time, you don't need to take your
> production down - or even affect it's processing...
>
> What I've done (with admittedly smaller data sets) is build a separate
> environment (usually on VM's) and once it's set up, I do the new indexing
> according to the new "rules"  (Like your change of long to string)
>
> Then, in a sense, I don't care how long it takes because it is not
> affecting Prod.
>
> When it's done, I simply switch my load balancer to point to the new
> environment and shut down the old one.
>
> To users, this could be seamless if you handle the load balancer correctly
> and have it refuse new connections to the old servers while routing all new
> connections to the new Solr servers...
>
> On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar 
> wrote:
>
> > Hi Nick and Shawn,
> >
> > Thanks so much for the pointers. I will try that out. Thank you again!
> >
> > On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev 
> > wrote:
> >
> > > Hi, I work on a python Solr Client
> > >  library and there is a
> > > reindexing helper module that you can use if you are on Solr 4.9+. I
> use
> > it
> > > all the time and I think it works pretty well. You can re-index all
> > > documents from a collection into another collection or dump them to the
> > > filesystem as JSON. It also supports parallel execution and can run
> > > independently on each shard. There is also a way to resume if your job
> > > craps out half way through if your existing schema is set up with a
> good
> > > date field and unique id.
> > >
> > > You can read the documentation here:
> > > http://solrclient.readthedocs.io/en/latest/Reindexer.html
> > >
> > > Code is pretty short and is here:
> > > https://github.com/moonlitesolutions/SolrClient/
> blob/master/SolrClient/
> > > helpers/reindexer.py
> > >
> > > Here is sample:
> > > from SolrClient import SolrClient
> > > from SolrClient.helpers import Reindexer
> > >
> > > r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> > > http://destination_solr:8983/solr') , source_coll='source_collection',
> > > dest_coll='destination-collection')
> > > r.reindex()
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey 
> > wrote:
> > >
> > > > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > > > What would be the best way to re-index the data in the SOLR cloud?
> We
> > > > > have around 65 million data and we are planning to change the
> schema
> > > > > by changing the unique key type from long to string. How long does
> it
> > > > > take to re-index 65 million documents in SOLR and can you please
> > > > > suggest how to do that?
> > > >
> > > > There is no magic bullet.  And there's no way for anybody but you to
> > > > determine how long it's going to take.  There are people who have
> > > > achieved over 50K inserts per second, and others who have difficulty
> > > > reaching 1000 per second.  Many factors affect indexing speed,
> > including
> > > > the size of your documents, the complexity of your analysis, the
> > > > capabilities of your hardware, and how many threads/processes you are
> > > > using at the same time when you index.
> > > >
> > > > Here's some more detailed info about reindexing, but it's probably
> not
> > > > what you wanted to hear:
> > > >
> > > > https://wiki.apache.org/solr/HowToReindex
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Bharath MV Kumar
> >
> > "Life is short, enjoy every moment of it"
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: Unique key field type in solr 6.1 schema

2016-08-09 Thread Bharath Kumar

Hi Daniel,

Thanks so much for your inputs. I tried this with SOLR 6.1.0 fresh data and
if i use id field as long, the target site is not able to replay the
transaction when we use delete by id. I opened a ticket in JIRA -
https://issues.apache.org/jira/browse/SOLR-9394. Below is the exception on
the SOLR leader on the target site. Also if i try with delete by query it
works and also if i change id to string it works.

Error stacktrace on the target site SOLR node leader:-

2016-08-06 08:09:21.091 ERROR (qtp472654579-2699) [c:collection s:shard1
r:core_node3 x:collection] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Invalid Number: A@^L^K0W
at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:537)
at
org.apache.solr.update.DeleteUpdateCommand.getIndexedId(DeleteUpdateCommand.java:65)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1495)
at
org.apache.solr.update.processor.CdcrUpdateProcessor.versionDelete(CdcrUpdateProcessor.java:85)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:1154)
at
org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:151)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:112)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)

On Tue, Aug 9, 2016 at 8:55 AM, Daniel Collins 
wrote:

> This vaguely rings a bell, though from a long time ago.  We had our id
> field using the "lowercase" type in Solr, and that broke/changed somewhere
> in the 4.x series (we are on 4.8.1 now and it doesn't work there), so we
> have to revert to a simple "string" type instead.  I know you have a very
> different use case, but I don't think its anything to do with CDCR or 6.x,
> I think its a "problem" in the 4.x series. You might want to check the 4.x
> release notes, and/or try upgrading to 4.10.4 (the latest in the 4.x
> series) just to see what the behavior is there, I think it changed
> somewhere around 4.4 or 4.6...
>
> But I'm talking probably 2-3 years ago, so my memory is hazy on this.
>
> On 9 August 2016 at 08:51, bharath.mvkumar 
> wrote:
>
> > Hi All,
> >
> > I have an issue with cross data center replication, when we delete the
> > document

Re: How to re-index SOLR data

2016-08-09 Thread John Bickerstaff

In case this helps...

Assuming you have the resources to build a copy of your production
environment and assuming you have the time, you don't need to take your
production down - or even affect it's processing...

What I've done (with admittedly smaller data sets) is build a separate
environment (usually on VM's) and once it's set up, I do the new indexing
according to the new "rules"  (Like your change of long to string)

Then, in a sense, I don't care how long it takes because it is not
affecting Prod.

When it's done, I simply switch my load balancer to point to the new
environment and shut down the old one.

To users, this could be seamless if you handle the load balancer correctly
and have it refuse new connections to the old servers while routing all new
connections to the new Solr servers...

On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar 
wrote:

> Hi Nick and Shawn,
>
> Thanks so much for the pointers. I will try that out. Thank you again!
>
> On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev 
> wrote:
>
> > Hi, I work on a python Solr Client
> >  library and there is a
> > reindexing helper module that you can use if you are on Solr 4.9+. I use
> it
> > all the time and I think it works pretty well. You can re-index all
> > documents from a collection into another collection or dump them to the
> > filesystem as JSON. It also supports parallel execution and can run
> > independently on each shard. There is also a way to resume if your job
> > craps out half way through if your existing schema is set up with a good
> > date field and unique id.
> >
> > You can read the documentation here:
> > http://solrclient.readthedocs.io/en/latest/Reindexer.html
> >
> > Code is pretty short and is here:
> > https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/
> > helpers/reindexer.py
> >
> > Here is sample:
> > from SolrClient import SolrClient
> > from SolrClient.helpers import Reindexer
> >
> > r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> > http://destination_solr:8983/solr') , source_coll='source_collection',
> > dest_coll='destination-collection')
> > r.reindex()
> >
> >
> >
> >
> >
> >
> > On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey 
> wrote:
> >
> > > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > > What would be the best way to re-index the data in the SOLR cloud? We
> > > > have around 65 million data and we are planning to change the schema
> > > > by changing the unique key type from long to string. How long does it
> > > > take to re-index 65 million documents in SOLR and can you please
> > > > suggest how to do that?
> > >
> > > There is no magic bullet.  And there's no way for anybody but you to
> > > determine how long it's going to take.  There are people who have
> > > achieved over 50K inserts per second, and others who have difficulty
> > > reaching 1000 per second.  Many factors affect indexing speed,
> including
> > > the size of your documents, the complexity of your analysis, the
> > > capabilities of your hardware, and how many threads/processes you are
> > > using at the same time when you index.
> > >
> > > Here's some more detailed info about reindexing, but it's probably not
> > > what you wanted to hear:
> > >
> > > https://wiki.apache.org/solr/HowToReindex
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>

Re: Solr DeleteByQuery vs DeleteById

2016-08-09 Thread Bharath Kumar

Hi Danny and Daniel,

Thank you so much for your inputs.

Actually we use deleteByIds, but because we need the CDCR solution to work
for us, we are having issues when we use deleteById. The deleteById logs a
transaction in the transaction logs and that when passed over to the target
site, the CDCR update processor is not able to process that transaction.
The issue occurs when we use unique key "id" field type as long. If we use
it as "string", there are no problems. But we have already data in
production, if we change the schema we need to re-index. So that is one of
the reason we are thinking of using delete by query.

I opened a ticket in JIRA - https://issues.apache.org/jira/browse/SOLR-9394
as well.

On Tue, Aug 9, 2016 at 8:58 AM, Daniel Collins 
wrote:

> Seconding that point, we currently do DBQ to "tidy" some of our collections
> and time-bound them (so running "delete anything older than X").  They have
> similar issues with reordering and blocking from time to time.
>
> On 9 August 2016 at 14:20, danny teichthal  wrote:
>
> > Hi Bharath,
> > I'm no expert, but we had some major problems because of deleteByQuery (
> in
> > short DBQ).
> > We ended up replacing all of our DBQ to delete by ids.
> >
> > My suggestion is that if you don't realy need it - don't use it.
> > Especially in your case, since you already know the population of ids, it
> > is redundant to query for it.
> >
> > I don't know how CDCR works, but we have a replication factor of 2 on our
> > SolrCloud cluster.
> > Since Solr 5.x , DBQ were stuck for a long while on the replicas,
> blocking
> > all updates.
> > It appears that on the replica side, there's an overhead of reordering
> and
> > executing the same DBQ over and over again, for consistency reasons.
> > It ends up buffering many delete by queries and blocks all updates.
> > In addition there's another defect on related slowness on DBQ -
> LUCENE-7049
> >
> >
> >
> >
> >
> > On Tue, Aug 9, 2016 at 7:14 AM, Bharath Kumar  >
> > wrote:
> >
> > > Hi All,
> > >
> > > We are using SOLR 6.1 and i wanted to know which is better to use -
> > > deleteById or deleteByQuery?
> > >
> > > We have a program which deletes 10 documents every 5 minutes from
> the
> > > SOLR and we do it in a batch of 200 to delete those documents. For that
> > we
> > > now use deleteById(List ids, 1) to delete.
> > > I wanted to know if we change it to deleteByQuery(query, 1) where
> the
> > > query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
> > > performance impact?
> > >
> > > We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
> > > similar setup on the target site and we use Cross Data Center
> Replication
> > > to replicate from main site.
> > >
> > > Can you please let me know if using deleteByQuery will have any
> impact? I
> > > see it opens real time searcher on all the nodes in cluster.
> > >
> > > --
> > > Thanks & Regards,
> > > Bharath MV Kumar
> > >
> > > "Life is short, enjoy every moment of it"
> > >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: How to re-index SOLR data

2016-08-09 Thread Bharath Kumar

Hi Nick and Shawn,

Thanks so much for the pointers. I will try that out. Thank you again!

On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev 
wrote:

> Hi, I work on a python Solr Client
>  library and there is a
> reindexing helper module that you can use if you are on Solr 4.9+. I use it
> all the time and I think it works pretty well. You can re-index all
> documents from a collection into another collection or dump them to the
> filesystem as JSON. It also supports parallel execution and can run
> independently on each shard. There is also a way to resume if your job
> craps out half way through if your existing schema is set up with a good
> date field and unique id.
>
> You can read the documentation here:
> http://solrclient.readthedocs.io/en/latest/Reindexer.html
>
> Code is pretty short and is here:
> https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/
> helpers/reindexer.py
>
> Here is sample:
> from SolrClient import SolrClient
> from SolrClient.helpers import Reindexer
>
> r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> http://destination_solr:8983/solr') , source_coll='source_collection',
> dest_coll='destination-collection')
> r.reindex()
>
>
>
>
>
>
> On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey  wrote:
>
> > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > What would be the best way to re-index the data in the SOLR cloud? We
> > > have around 65 million data and we are planning to change the schema
> > > by changing the unique key type from long to string. How long does it
> > > take to re-index 65 million documents in SOLR and can you please
> > > suggest how to do that?
> >
> > There is no magic bullet.  And there's no way for anybody but you to
> > determine how long it's going to take.  There are people who have
> > achieved over 50K inserts per second, and others who have difficulty
> > reaching 1000 per second.  Many factors affect indexing speed, including
> > the size of your documents, the complexity of your analysis, the
> > capabilities of your hardware, and how many threads/processes you are
> > using at the same time when you index.
> >
> > Here's some more detailed info about reindexing, but it's probably not
> > what you wanted to hear:
> >
> > https://wiki.apache.org/solr/HowToReindex
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: Need Permission to commit feature branch for Pull Request SOLR-8146

2016-08-09 Thread Jan Høydahl

Hi,

You need to create the feature branch in your own fork of the project, not in a 
clone of apache/lucene-solr.
Please see http://wiki.apache.org/solr/HowToContribute#Working_with_GitHub 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 9. aug. 2016 kl. 17.14 skrev Susheel Kumar :
> 
> Hello,
> 
> I created a feature branch for SOLR-8146 that i can submit a pull request
> (PR) for review. While pushing the feature branch i am getting below error.
> My github id is susheel2...@gmail.com
> 
> Thanks,
> 
> Susheel
> 
> lucene-solr git:(SOLR-8146) git push origin SOLR-8146
> 
> Username for 'https://github.com': susheel2...@gmail.com
> 
> Password for 'https://susheel2...@gmail.com@github.com':
> 
> remote: Permission to apache/lucene-solr.git denied to sushil2777.

RE: Solr and Drupal

2016-08-09 Thread Davis, Daniel (NIH/NLM) [C]

John/Rose,

With Drupal 7, the module John pointed to was the module to use.
With Drupal 8, I have no idea.

-Original Message-
From: John Bickerstaff [mailto:j...@johnbickerstaff.com] 
Sent: Tuesday, August 09, 2016 2:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr and Drupal

Rose --

Further reading on the drupal site suggests to me that the latest Drupal
(8?) comes with a generic "connector" that can be tied to any search engine and 
that the instructions on the page I sent may be superseded by the new 
connector...

I'm not familiar with Drupal beyond simple experimentation a few years ago, but 
that's how I'd build it - make a connector and consume the returned data (json, 
xml, whatever) and then turn it into Drupal-formatted html or something similar.

I think you might want to pursue the particulars on the Drupal list (I assume 
one exists...)

HTH

On Tue, Aug 9, 2016 at 12:23 PM, Rose, John B  wrote:

> Sameer, John
>
> Thanks
>
>
> From: Sameer Maggon 
> Reply-To: "solr-user@lucene.apache.org" 
> Date: Tuesday, August 9, 2016 at 1:46 PM
> To: "solr-user@lucene.apache.org" 
> Subject: Re: Solr and Drupal
>
> Hi John,
>
> As John B. mentioned, you can utilize the plugin here - 
> https://www.drupal.org/project/apachesolr. mailtrack.io/trace/link/5b49557fccf2653a8333a25cc6f15c
> 245ccf7ec9?url=https%3A%2F%2Fwww.drupal.org%2Fproject%
> 2Fapachesolr.=e242eddec6d9f0d9> If you are looking to not 
> have to worry about hosting, deployment, scaling and management, you 
> can take a look at SearchStax by Measured Search to get a Solr 
> deployment up and running in a couple of minutes and not have to get 
> into installing Solr and going through a learning curve around setup and 
> scale.
>
>
> Thanks,
> Sameer.
>
>
>
> On Tue, Aug 9, 2016 at 12:11 PM, Rose, John B > wrote:
> We are looking at Solr for a Drupal web site. We have never installed Solr.
>
>
> From my readings it is not clear exactly what we need to implement a 
> search in Drupal with Solr. Some sites have implied Lucene and/or 
> Tomcat are needed.
>
>
> Can someone point me to the site that explains minimally what is 
> needed to implement Solr within Drupal?
>
>
> Thanks for your time
>
>
>
> --
> Sameer Maggon
> www.measuredsearch.com 3404ae650cc88b51d518880f313638b7ca7d7f2c?url=http%3A%2F%
> 2Fwww.measuredsearch.com=6436799da5f290d7>
> 1.844.9.SEARCH
> [cid:ii_iog5zjpp2_154cfcda7a913c0a]
> Measured Search is the only Fully Managed Solr as a Service 
> multi-cloud capable offering.
> Plus utilize our On Demand Expertise to build your applications faster 
> and with more confidence.
>

Re: Solr and Drupal

2016-08-09 Thread Rose, John B

Ok thanks.


On 8/9/16, 2:38 PM, "John Bickerstaff"  wrote:

>Rose --
>
>Further reading on the drupal site suggests to me that the latest Drupal
>(8?) comes with a generic "connector" that can be tied to any search engine
>and that the instructions on the page I sent may be superseded by the new
>connector...
>
>I'm not familiar with Drupal beyond simple experimentation a few years ago,
>but that's how I'd build it - make a connector and consume the returned
>data (json, xml, whatever) and then turn it into Drupal-formatted html or
>something similar.
>
>I think you might want to pursue the particulars on the Drupal list (I
>assume one exists...)
>
>HTH
>
>On Tue, Aug 9, 2016 at 12:23 PM, Rose, John B  wrote:
>
>> Sameer, John
>>
>> Thanks
>>
>>
>> From: Sameer Maggon 
>> Reply-To: "solr-user@lucene.apache.org" 
>> Date: Tuesday, August 9, 2016 at 1:46 PM
>> To: "solr-user@lucene.apache.org" 
>> Subject: Re: Solr and Drupal
>>
>> Hi John,
>>
>> As John B. mentioned, you can utilize the plugin here -
>> https://www.drupal.org/project/apachesolr.> mailtrack.io/trace/link/5b49557fccf2653a8333a25cc6f15c
>> 245ccf7ec9?url=https%3A%2F%2Fwww.drupal.org%2Fproject%
>> 2Fapachesolr.=e242eddec6d9f0d9> If you are looking to not have
>> to worry about hosting, deployment, scaling and management, you can take a
>> look at SearchStax by Measured Search to get a Solr deployment up and
>> running in a couple of minutes and not have to get into installing Solr and
>> going through a learning curve around setup and scale.
>>
>>
>> Thanks,
>> Sameer.
>>
>>
>>
>> On Tue, Aug 9, 2016 at 12:11 PM, Rose, John B  jbr...@utk.edu>> wrote:
>> We are looking at Solr for a Drupal web site. We have never installed Solr.
>>
>>
>> From my readings it is not clear exactly what we need to implement a
>> search in Drupal with Solr. Some sites have implied Lucene and/or Tomcat
>> are needed.
>>
>>
>> Can someone point me to the site that explains minimally what is needed to
>> implement Solr within Drupal?
>>
>>
>> Thanks for your time
>>
>>
>>
>> --
>> Sameer Maggon
>> www.measuredsearch.com> 3404ae650cc88b51d518880f313638b7ca7d7f2c?url=http%3A%2F%
>> 2Fwww.measuredsearch.com=6436799da5f290d7>
>> 1.844.9.SEARCH
>> [cid:ii_iog5zjpp2_154cfcda7a913c0a]
>> Measured Search is the only Fully Managed Solr as a Service multi-cloud
>> capable offering.
>> Plus utilize our On Demand Expertise to build your applications faster and
>> with more confidence.
>>

Re: Solr and Drupal

2016-08-09 Thread John Bickerstaff

Rose --

Further reading on the drupal site suggests to me that the latest Drupal
(8?) comes with a generic "connector" that can be tied to any search engine
and that the instructions on the page I sent may be superseded by the new
connector...

I'm not familiar with Drupal beyond simple experimentation a few years ago,
but that's how I'd build it - make a connector and consume the returned
data (json, xml, whatever) and then turn it into Drupal-formatted html or
something similar.

I think you might want to pursue the particulars on the Drupal list (I
assume one exists...)

HTH

On Tue, Aug 9, 2016 at 12:23 PM, Rose, John B  wrote:

> Sameer, John
>
> Thanks
>
>
> From: Sameer Maggon 
> Reply-To: "solr-user@lucene.apache.org" 
> Date: Tuesday, August 9, 2016 at 1:46 PM
> To: "solr-user@lucene.apache.org" 
> Subject: Re: Solr and Drupal
>
> Hi John,
>
> As John B. mentioned, you can utilize the plugin here -
> https://www.drupal.org/project/apachesolr. mailtrack.io/trace/link/5b49557fccf2653a8333a25cc6f15c
> 245ccf7ec9?url=https%3A%2F%2Fwww.drupal.org%2Fproject%
> 2Fapachesolr.=e242eddec6d9f0d9> If you are looking to not have
> to worry about hosting, deployment, scaling and management, you can take a
> look at SearchStax by Measured Search to get a Solr deployment up and
> running in a couple of minutes and not have to get into installing Solr and
> going through a learning curve around setup and scale.
>
>
> Thanks,
> Sameer.
>
>
>
> On Tue, Aug 9, 2016 at 12:11 PM, Rose, John B > wrote:
> We are looking at Solr for a Drupal web site. We have never installed Solr.
>
>
> From my readings it is not clear exactly what we need to implement a
> search in Drupal with Solr. Some sites have implied Lucene and/or Tomcat
> are needed.
>
>
> Can someone point me to the site that explains minimally what is needed to
> implement Solr within Drupal?
>
>
> Thanks for your time
>
>
>
> --
> Sameer Maggon
> www.measuredsearch.com 3404ae650cc88b51d518880f313638b7ca7d7f2c?url=http%3A%2F%
> 2Fwww.measuredsearch.com=6436799da5f290d7>
> 1.844.9.SEARCH
> [cid:ii_iog5zjpp2_154cfcda7a913c0a]
> Measured Search is the only Fully Managed Solr as a Service multi-cloud
> capable offering.
> Plus utilize our On Demand Expertise to build your applications faster and
> with more confidence.
>

Fwd: Getting dynamic fields using LukeRequest.

2016-08-09 Thread Pranaya Behera

 Forwarded Message 
Subject:Getting dynamic fields using LukeRequest.
Date:   Tue, 9 Aug 2016 18:22:15 +0530
From:   Pranaya Behera 
To: solr-user@lucene.apache.org

Hi,
  I have the following script to retrieve all the fields in the
collection. I am using SolrCloud 6.1.0.
LukeRequest lukeRequest = new LukeRequest();
lukeRequest.setNumTerms(0);
lukeRequest.setShowSchema(false);
LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
Map fieldInfoMap =
lukeResponse.getFieldInfo();
for (Map.Entry entry :
fieldInfoMap.entrySet()) {
   entry.getKey(); // Here fieldInfoMap is size of 0 for sometime and
sometime it is getting incomplete data.
}

Setting showSchema to true doesn't yield any result. Only making it
false yields result that too incomplete data. As I can see in the doc
that it has more than what it is saying it has.

LukeRequest hits
/solr/product/admin/luke?numTerms=0=javabin=2 HTTP/1.1 .

How it should be configured for solrcloud ?
I have already mentioned

in the solrconfig.xml. It doesn't matter whether it is present in the
solrconfig or not as I am requesting it from solrj.

Re: Solr and Drupal

2016-08-09 Thread Rose, John B

Sameer, John

Thanks

From: Sameer Maggon 
Reply-To: "solr-user@lucene.apache.org" 
Date: Tuesday, August 9, 2016 at 1:46 PM
To: "solr-user@lucene.apache.org" 
Subject: Re: Solr and Drupal

Hi John,

As John B. mentioned, you can utilize the plugin here - 
https://www.drupal.org/project/apachesolr.
 If you are looking to not have to worry about hosting, deployment, scaling and 
management, you can take a look at SearchStax by Measured Search to get a Solr 
deployment up and running in a couple of minutes and not have to get into 
installing Solr and going through a learning curve around setup and scale.

Thanks,
Sameer.

On Tue, Aug 9, 2016 at 12:11 PM, Rose, John B 
> wrote:
We are looking at Solr for a Drupal web site. We have never installed Solr.

From my readings it is not clear exactly what we need to implement a search in 
Drupal with Solr. Some sites have implied Lucene and/or Tomcat are needed.

Can someone point me to the site that explains minimally what is needed to 
implement Solr within Drupal?

Thanks for your time

--
Sameer Maggon
www.measuredsearch.com
1.844.9.SEARCH
[cid:ii_iog5zjpp2_154cfcda7a913c0a]
Measured Search is the only Fully Managed Solr as a Service multi-cloud capable 
offering.
Plus utilize our On Demand Expertise to build your applications faster and with 
more confidence.

Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-09 Thread Shawn Heisey

On 8/8/2016 11:09 AM, Ritesh Kumar (Avanade) wrote:
> This is great but where can I do this change in SOLR 6 as I have
> implemented CDCR.

In Solr 6, the chance of using Tomcat will be near zero, and the
maxThreads setting in Solr's Jetty config should already be set to 1.

If you're seeing this same OOME (can't create a new thread) in Solr 6,
then the problem is most likely going to be at the operating system
level.  Exactly how to increase the number of processes/threads that
Solr can create will vary depending on the operating system you're
running.  For help, consult documentation or support resources for your
OS, or maybe Google.

If you're seeing a different problem, then please send a brand new
message to the list detailing your problem.

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn

Re: Solr and Drupal

2016-08-09 Thread Sameer Maggon

Hi John,

As John B. mentioned, you can utilize the plugin here -
https://www.drupal.org/project/apachesolr.

If you are looking to not have to worry about hosting, deployment, scaling
and management, you can take a look at SearchStax by Measured Search to get
a Solr deployment up and running in a couple of minutes and not have to get
into installing Solr and going through a learning curve around setup and
scale.

Thanks,
Sameer.

On Tue, Aug 9, 2016 at 12:11 PM, Rose, John B  wrote:

> We are looking at Solr for a Drupal web site. We have never installed Solr.
>
>
> From my readings it is not clear exactly what we need to implement a
> search in Drupal with Solr. Some sites have implied Lucene and/or Tomcat
> are needed.
>
>
> Can someone point me to the site that explains minimally what is needed to
> implement Solr within Drupal?
>
>
> Thanks for your time
>

-- 
*Sameer Maggon*
www.measuredsearch.com

1.844.9.SEARCH
Measured Search is the only *Fully Managed Solr as a Service* multi-cloud
capable offering.
Plus utilize our *On Demand Expertise* to build your applications faster
and with more confidence.

Re: Solr and Drupal

2016-08-09 Thread John Bickerstaff

This might be a good place to start...

https://www.drupal.org/project/apachesolr

On Tue, Aug 9, 2016 at 11:11 AM, Rose, John B  wrote:

> We are looking at Solr for a Drupal web site. We have never installed Solr.
>
>
> From my readings it is not clear exactly what we need to implement a
> search in Drupal with Solr. Some sites have implied Lucene and/or Tomcat
> are needed.
>
>
> Can someone point me to the site that explains minimally what is needed to
> implement Solr within Drupal?
>
>
> Thanks for your time
>

Modifying fl in QParser

2016-08-09 Thread Beale, Jim (US-KOP)

Hi,

Is it possible to modify the SolrParam, fl, to append selected dynamic fields, 
while rewriting a query in QParser.parse()?

Thanks in advance!


Jim Beale
Senior Lead Developer
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Mobile: 610-220-3067

[cid:image001.png@01CD6E5F.BE5E6C20]

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you. **hibu IT Code:141459300**

Solr and Drupal

2016-08-09 Thread Rose, John B

We are looking at Solr for a Drupal web site. We have never installed Solr.


>From my readings it is not clear exactly what we need to implement a search in 
>Drupal with Solr. Some sites have implied Lucene and/or Tomcat are needed.


Can someone point me to the site that explains minimally what is needed to 
implement Solr within Drupal?


Thanks for your time

Help for -- Filter in the text field + highlight + no affect on boosting(if done with q instead of fq)

2016-08-09 Thread Raleraskar, Mayur

Hi All,
I am using Solr for search functionality here @ eBay reviews team.

I need to implement search functionality with q parameter but do not want it, 
to affect boosting or relevancy. How can I achieve that? Effectively I want it 
perform just like a filter.
My query is like
SolrIp:Port/select?defType=edismax=text%3Agood=max%28relevanceScore_dx%2C+0.1%29=recip%28abs%28ms%28NOW%2FYEAR%2B1YEAR%2ClastEditedDate_dt%29%29%2C+3.16e-11%2C1%2C1%29=0=5=true=%7B%21ex%3Dlab_ix%2Clocale_sx%7Drating_ix=%7B%21ex%3Dlab_ix%7Dlabel_ix=count=0=100=0=status_ix%3A1=%7B%21tag%3Dlab_ix%7Dlabel_ix%3A2=siteId_ix%3A0=subjectReferenceId_lx%3A1040409165+AND+subjectType_sx%3AP=json=true=id=true

OR
I can search/filter with fq parameter but I need to highlight words which are 
filtered by fq. Just the words in the text, which matches fq regexnot the 
entire text field.
My query is like
SolrIp:Port/select?defType=edismax=*%3A*=max%28relevanceScore_dx%2C+0.1%29=recip%28abs%28ms%28NOW%2FYEAR%2B1YEAR%2ClastEditedDate_dt%29%29%2C+3.16e-11%2C1%2C1%29=0=50=true=%7B%21ex%3Dlab_ix%2Clocale_sx%7Drating_ix=%7B%21ex%3Dlab_ix%7Dlabel_ix=count=0=100=0=status_ix%3A1=%7B%21tag%3Dlab_ix%7Dlabel_ix%3A2=siteId_ix%3A0=subjectReferenceId_lx%3A1040409165+AND+subjectType_sx%3AP=text%3Agood=json=true=id


Thanks in advance,
Mayur

Re: Can a MergeStrategy filter returned docs?

2016-08-09 Thread tedsolr

After some more digging I've learned that the Query gets called more than
once on the same shard, but with a different shard purpose. I don't
understand the flow, but I assume that one call is triggering the
transform() via a path that does not pass through the document collector. I
also don't know why this flow changes just because of sharding. But perhaps
the first pass through the docs are collected and sorted and the merge
happens (PURPOSE_GET_TOP_IDS), and then the requested field data is gathered
(PURPOSE_GET_FIELDS). It's this second pass that barfs in the transform()
method because my custom analytics data doesn't exist. I would expect an NPE
instead of a read error, nonetheless if I wrap the mycustomdata.get() with a
null check the search works as expected.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-MergeStrategy-filter-returned-docs-tp4290446p4290998.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to re-index SOLR data

2016-08-09 Thread Nick Vasilyev

Hi, I work on a python Solr Client
 library and there is a
reindexing helper module that you can use if you are on Solr 4.9+. I use it
all the time and I think it works pretty well. You can re-index all
documents from a collection into another collection or dump them to the
filesystem as JSON. It also supports parallel execution and can run
independently on each shard. There is also a way to resume if your job
craps out half way through if your existing schema is set up with a good
date field and unique id.

You can read the documentation here:
http://solrclient.readthedocs.io/en/latest/Reindexer.html

Code is pretty short and is here:
https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/helpers/reindexer.py

Here is sample:
from SolrClient import SolrClient
from SolrClient.helpers import Reindexer

r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
http://destination_solr:8983/solr') , source_coll='source_collection',
dest_coll='destination-collection')
r.reindex()






On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey  wrote:

> On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > What would be the best way to re-index the data in the SOLR cloud? We
> > have around 65 million data and we are planning to change the schema
> > by changing the unique key type from long to string. How long does it
> > take to re-index 65 million documents in SOLR and can you please
> > suggest how to do that?
>
> There is no magic bullet.  And there's no way for anybody but you to
> determine how long it's going to take.  There are people who have
> achieved over 50K inserts per second, and others who have difficulty
> reaching 1000 per second.  Many factors affect indexing speed, including
> the size of your documents, the complexity of your analysis, the
> capabilities of your hardware, and how many threads/processes you are
> using at the same time when you index.
>
> Here's some more detailed info about reindexing, but it's probably not
> what you wanted to hear:
>
> https://wiki.apache.org/solr/HowToReindex
>
> Thanks,
> Shawn
>
>

Re: Solr DeleteByQuery vs DeleteById

2016-08-09 Thread Daniel Collins

Seconding that point, we currently do DBQ to "tidy" some of our collections
and time-bound them (so running "delete anything older than X").  They have
similar issues with reordering and blocking from time to time.

On 9 August 2016 at 14:20, danny teichthal  wrote:

> Hi Bharath,
> I'm no expert, but we had some major problems because of deleteByQuery ( in
> short DBQ).
> We ended up replacing all of our DBQ to delete by ids.
>
> My suggestion is that if you don't realy need it - don't use it.
> Especially in your case, since you already know the population of ids, it
> is redundant to query for it.
>
> I don't know how CDCR works, but we have a replication factor of 2 on our
> SolrCloud cluster.
> Since Solr 5.x , DBQ were stuck for a long while on the replicas, blocking
> all updates.
> It appears that on the replica side, there's an overhead of reordering and
> executing the same DBQ over and over again, for consistency reasons.
> It ends up buffering many delete by queries and blocks all updates.
> In addition there's another defect on related slowness on DBQ - LUCENE-7049
>
>
>
>
>
> On Tue, Aug 9, 2016 at 7:14 AM, Bharath Kumar 
> wrote:
>
> > Hi All,
> >
> > We are using SOLR 6.1 and i wanted to know which is better to use -
> > deleteById or deleteByQuery?
> >
> > We have a program which deletes 10 documents every 5 minutes from the
> > SOLR and we do it in a batch of 200 to delete those documents. For that
> we
> > now use deleteById(List ids, 1) to delete.
> > I wanted to know if we change it to deleteByQuery(query, 1) where the
> > query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
> > performance impact?
> >
> > We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
> > similar setup on the target site and we use Cross Data Center Replication
> > to replicate from main site.
> >
> > Can you please let me know if using deleteByQuery will have any impact? I
> > see it opens real time searcher on all the nodes in cluster.
> >
> > --
> > Thanks & Regards,
> > Bharath MV Kumar
> >
> > "Life is short, enjoy every moment of it"
> >
>

Re: Unique key field type in solr 6.1 schema

2016-08-09 Thread Daniel Collins

This vaguely rings a bell, though from a long time ago.  We had our id
field using the "lowercase" type in Solr, and that broke/changed somewhere
in the 4.x series (we are on 4.8.1 now and it doesn't work there), so we
have to revert to a simple "string" type instead.  I know you have a very
different use case, but I don't think its anything to do with CDCR or 6.x,
I think its a "problem" in the 4.x series. You might want to check the 4.x
release notes, and/or try upgrading to 4.10.4 (the latest in the 4.x
series) just to see what the behavior is there, I think it changed
somewhere around 4.4 or 4.6...

But I'm talking probably 2-3 years ago, so my memory is hazy on this.

On 9 August 2016 at 08:51, bharath.mvkumar 
wrote:

> Hi All,
>
> I have an issue with cross data center replication, when we delete the
> document by id from the main site. The target site document is not deleted.
> I have the id field which is a unique field for my schema which is
> configured as "long".
>
> If i change the type to "string" it works fine. Is there any issue using
> long. Because we migrated from 4.4 to 6.1, and we had the id field as long.
> Can you please help me with this. Really appreciate your help.
>
> I see the below error on the target site:-
>
>  o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Invalid
> Number:
>   at org.apache.solr.schema.TrieField.readableToIndexed(
> TrieField.java:537)
> at
> org.apache.solr.update.DeleteUpdateCommand.getIndexedId(
> DeleteUpdateCommand.java:65)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(
> DistributedUpdateProcessor.java:1495)
> at
> org.apache.solr.update.processor.CdcrUpdateProcessor.versionDelete(
> CdcrUpdateProcessor.java:85)
>
> Thanks,
> Bharath Kumar
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Unique-key-field-type-in-solr-6-1-schema-tp4290895.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Returning parent's field which searching for child

2016-08-09 Thread Zheng Lin Edwin Yeo

If I drop childNo_s:123456 part, the result will return both parent and
child, which is what I want.
But this will match all the children, since the only filter now is
contentType:child.

It didn't work if I put only childNo_s:123456 and remove the
contentType:child part.

Regards,
Edwin


On 9 August 2016 at 22:03, Mikhail Khludnev  wrote:

> I wonder why {!parent} doesn't have subordinate clause. I suggest to drop
> childNo_s:123456 part of child filter just to get something to start from.
>
> On Tue, Aug 9, 2016 at 4:09 PM, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Mikhail,
> >
> > Thanks for your reply.
> > I tried this, but it only returns the parent's record and not the child
> > record.
> >
> > http://localhost/solr/collection1/select?q={!parent+
> > which%3D%22contentType:parent%22}=*,[child+parentFilter%
> > 3D$parent_filter+childFilter%3D$child_filter]_
> > filter=contentType:parent_filter=(contentType:
> > child+AND+childNo_s:123456)
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 9 August 2016 at 18:23, Mikhail Khludnev  wrote:
> >
> > > Hello Edwin,
> > > Have you tried to combine q={!parent ..}... =*,[child ...] ?
> > >
> > > On Tue, Aug 9, 2016 at 12:59 PM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Would like to check, is it possible to return certain fields from
> > > parent's
> > > > record, when we are searching for fields that are only contained in
> > child
> > > > records.
> > > >
> > > > For example, for this query:
> > > > http://localhost:8983/solr/collection1/select?q=childField:123
> > > >
> > > > The child field can only be found in child, so this search will not
> > > return
> > > > any parent's record. But if I want to return like says its parent's
> ID
> > > > together with the above query, is it possible to be done?
> > > >
> > > > I'm using Solr 6.1.0.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Does solr support two phase commit or any other distributed transaction protocol?

2016-08-09 Thread Pablo Anzorena

Thanks Shawn, I understood perfectly well. One important thing in my use
case is that I only have one entry point for indexing solr, so I won't have
any problems of multiple threads trying to update the index.

So what can I do if I have to index in solr and also in postgres and I need
to do it transactionally?

I imagine something like:
1)Open a distributed transaction in postgresql and "index" the data with
the global id transaction.
1.1)If some problem occurs, rollback postgres. End of transaction.
2)Index data in solr. If no problem occurs, commit in solr and then commit
in postgres. End of transaction.
2.1)If some problem occurs in solr, rollback solr and rollback postgres.
End of transaction.

2016-08-09 11:24 GMT-03:00 Shawn Heisey :

> On 8/9/2016 7:55 AM, Pablo Anzorena wrote:
> > That's it. Thanks.
>
> Solr doesn't support transactions in the way that most people with a
> database background imagine them.
>
> With a typical database server, all changes to the database that happen
> on a single DB connection can be committed or rolled back completely
> independently from updates that happen on other DB connections.
>
> Solr doesn't work this way.
>
> In a Lucene index (Solr is a Lucene program), a "transaction" is all
> updates made since the last commit with openSearcher=true.  This
> includes ALL updates made, regardless of where they came from.  So if
> you have a dozen different threads/processes making changes to your Solr
> index, then have something do a commit, all of the updates made by those
> 12 sources before the commit will be committed.  There is no concept of
> an individual transaction.
>
> Adding the DB transaction model would be a *major* development effort,
> and there's a good chance that adding it would destroy the blazing
> search performance that Solr and Lucene are known for.
>
> Thanks,
> Shawn
>
>

Re: Does solr support two phase commit or any other distributed transaction protocol?

2016-08-09 Thread Walter Underwood

Solr does not have transactions.

A batch is submitted, then processed. The command to process the batch is named 
“commit”, but it isn’t very much like a database commit.

Batch submissions are not isolated between clients. If three batches are being 
submitted at the same time, a commit command from one client will cause all 
pending documents to be processed, not just the documents from that client.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 9, 2016, at 6:55 AM, Pablo Anzorena  wrote:
> 
> That's it.
> 
> Thanks.

Need Permission to commit feature branch for Pull Request SOLR-8146

2016-08-09 Thread Susheel Kumar

Hello,

I created a feature branch for SOLR-8146 that i can submit a pull request
(PR) for review. While pushing the feature branch i am getting below error.
My github id is susheel2...@gmail.com

Thanks,

Susheel

lucene-solr git:(SOLR-8146) git push origin SOLR-8146

Username for 'https://github.com': susheel2...@gmail.com

Password for 'https://susheel2...@gmail.com@github.com':

remote: Permission to apache/lucene-solr.git denied to sushil2777.

Re: Does solr support two phase commit or any other distributed transaction protocol?

2016-08-09 Thread Shawn Heisey

On 8/9/2016 7:55 AM, Pablo Anzorena wrote:
> That's it. Thanks.

Solr doesn't support transactions in the way that most people with a
database background imagine them.

With a typical database server, all changes to the database that happen
on a single DB connection can be committed or rolled back completely
independently from updates that happen on other DB connections.

Solr doesn't work this way.

In a Lucene index (Solr is a Lucene program), a "transaction" is all
updates made since the last commit with openSearcher=true.  This
includes ALL updates made, regardless of where they came from.  So if
you have a dozen different threads/processes making changes to your Solr
index, then have something do a commit, all of the updates made by those
12 sources before the commit will be committed.  There is no concept of
an individual transaction.

Adding the DB transaction model would be a *major* development effort,
and there's a good chance that adding it would destroy the blazing
search performance that Solr and Lucene are known for.

Thanks,
Shawn

Re: Custom SearchHandler with custom QueryResponseWriter

2016-08-09 Thread Alexandre Rafalovitch

Where did you put the jar that contains those custom classes? Perhaps
they are not being loaded. Is there an error message in the logs?

Are you doing this in standalone Solr or in a cloud-mode?

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 9 August 2016 at 20:04, Markus Boese  wrote:
> Hello everyone,
> I'm trying use my QueryResponseWriter with my SearchHandler, but the
> write(...)-Method of my QueryResponseWriter is not called.
>
> Excerpt of my sorlconfig.xml:
>
>  />
>
>  class="my.search.component.WWSearchComponent" />
>
> 
> 
> wwWriter
> ww_search_key
> 10
> 
> 
> wwSearcher
> 
> 
>
> Could anyone explain why a request to "/ww" not includes a call of
> WWResponseWriter?
> I just want to render custom Json aus output.
>
> --
> Greetz,
>
> Markus Boese

Re: Returning parent's field which searching for child

2016-08-09 Thread Mikhail Khludnev

I wonder why {!parent} doesn't have subordinate clause. I suggest to drop
childNo_s:123456 part of child filter just to get something to start from.

On Tue, Aug 9, 2016 at 4:09 PM, Zheng Lin Edwin Yeo 
wrote:

> Hi Mikhail,
>
> Thanks for your reply.
> I tried this, but it only returns the parent's record and not the child
> record.
>
> http://localhost/solr/collection1/select?q={!parent+
> which%3D%22contentType:parent%22}=*,[child+parentFilter%
> 3D$parent_filter+childFilter%3D$child_filter]_
> filter=contentType:parent_filter=(contentType:
> child+AND+childNo_s:123456)
>
>
> Regards,
> Edwin
>
>
> On 9 August 2016 at 18:23, Mikhail Khludnev  wrote:
>
> > Hello Edwin,
> > Have you tried to combine q={!parent ..}... =*,[child ...] ?
> >
> > On Tue, Aug 9, 2016 at 12:59 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Would like to check, is it possible to return certain fields from
> > parent's
> > > record, when we are searching for fields that are only contained in
> child
> > > records.
> > >
> > > For example, for this query:
> > > http://localhost:8983/solr/collection1/select?q=childField:123
> > >
> > > The child field can only be found in child, so this search will not
> > return
> > > any parent's record. But if I want to return like says its parent's ID
> > > together with the above query, is it possible to be done?
> > >
> > > I'm using Solr 6.1.0.
> > >
> > > Regards,
> > > Edwin
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Re: Getting dynamic fields using LukeRequest.

2016-08-09 Thread Steve Rowe

Not sure what the issue is with LukeRequest, but Solrj has Schema API support: 


You can see which options are supported here: 


--
Steve
www.lucidworks.com

> On Aug 9, 2016, at 8:52 AM, Pranaya Behera  wrote:
> 
> Hi,
> I have the following script to retrieve all the fields in the collection. 
> I am using SolrCloud 6.1.0.
> LukeRequest lukeRequest = new LukeRequest();
> lukeRequest.setNumTerms(0);
> lukeRequest.setShowSchema(false);
> LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
> Map fieldInfoMap = 
> lukeResponse.getFieldInfo();
> for (Map.Entry entry : 
> fieldInfoMap.entrySet()) {
>  entry.getKey(); // Here fieldInfoMap is size of 0 for sometime and sometime 
> it is getting incomplete data.
> }
> 
> 
> Setting showSchema to true doesn't yield any result. Only making it false 
> yields result that too incomplete data. As I can see in the doc that it has 
> more than what it is saying it has.
> 
> LukeRequest hits /solr/product/admin/luke?numTerms=0=javabin=2 
> HTTP/1.1 .
> 
> How it should be configured for solrcloud ?
> I have already mentioned
> 
>  class="org.apache.solr.handler.admin.LukeRequestHandler" />
> 
> in the solrconfig.xml. It doesn't matter whether it is present in the 
> solrconfig or not as I am requesting it from solrj.
>

Re: How to re-index SOLR data

2016-08-09 Thread Shawn Heisey

On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> What would be the best way to re-index the data in the SOLR cloud? We
> have around 65 million data and we are planning to change the schema
> by changing the unique key type from long to string. How long does it
> take to re-index 65 million documents in SOLR and can you please
> suggest how to do that?

There is no magic bullet.  And there's no way for anybody but you to
determine how long it's going to take.  There are people who have
achieved over 50K inserts per second, and others who have difficulty
reaching 1000 per second.  Many factors affect indexing speed, including
the size of your documents, the complexity of your analysis, the
capabilities of your hardware, and how many threads/processes you are
using at the same time when you index.

Here's some more detailed info about reindexing, but it's probably not
what you wanted to hear:

https://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn

Does solr support two phase commit or any other distributed transaction protocol?

2016-08-09 Thread Pablo Anzorena

That's it.

Thanks.

Re: commit it taking 1300 ms

2016-08-09 Thread Emir Arnautovic


Hi Midas,

Can you give us more details on your index: size, number of new docs 
between commits. Why do you think 1.3s for commit is to much and why do 
you need it to take less? Did you do any system/Solr monitoring?


Emir

On 09.08.2016 14:10, Midas A wrote:

please reply it is urgent.

On Tue, Aug 9, 2016 at 11:17 AM, Midas A  wrote:


Hi ,

commit is taking more than 1300 ms . what should i check on server.

below is my configuration .

 ${solr.autoCommit.maxTime:15000} <
openSearcher>false   
${solr.autoSoftCommit.maxTime:-1} 




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Solr DeleteByQuery vs DeleteById

2016-08-09 Thread danny teichthal

Hi Bharath,
I'm no expert, but we had some major problems because of deleteByQuery ( in
short DBQ).
We ended up replacing all of our DBQ to delete by ids.

My suggestion is that if you don't realy need it - don't use it.
Especially in your case, since you already know the population of ids, it
is redundant to query for it.

I don't know how CDCR works, but we have a replication factor of 2 on our
SolrCloud cluster.
Since Solr 5.x , DBQ were stuck for a long while on the replicas, blocking
all updates.
It appears that on the replica side, there's an overhead of reordering and
executing the same DBQ over and over again, for consistency reasons.
It ends up buffering many delete by queries and blocks all updates.
In addition there's another defect on related slowness on DBQ - LUCENE-7049

On Tue, Aug 9, 2016 at 7:14 AM, Bharath Kumar 
wrote:

> Hi All,
>
> We are using SOLR 6.1 and i wanted to know which is better to use -
> deleteById or deleteByQuery?
>
> We have a program which deletes 10 documents every 5 minutes from the
> SOLR and we do it in a batch of 200 to delete those documents. For that we
> now use deleteById(List ids, 1) to delete.
> I wanted to know if we change it to deleteByQuery(query, 1) where the
> query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
> performance impact?
>
> We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
> similar setup on the target site and we use Cross Data Center Replication
> to replicate from main site.
>
> Can you please let me know if using deleteByQuery will have any impact? I
> see it opens real time searcher on all the nodes in cluster.
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>

Re: Returning parent's field which searching for child

2016-08-09 Thread Zheng Lin Edwin Yeo

Hi Mikhail,

Thanks for your reply.
I tried this, but it only returns the parent's record and not the child
record.

http://localhost/solr/collection1/select?q={!parent+which%3D%22contentType:parent%22}=*,[child+parentFilter%3D$parent_filter+childFilter%3D$child_filter]_filter=contentType:parent_filter=(contentType:child+AND+childNo_s:123456)


Regards,
Edwin


On 9 August 2016 at 18:23, Mikhail Khludnev  wrote:

> Hello Edwin,
> Have you tried to combine q={!parent ..}... =*,[child ...] ?
>
> On Tue, Aug 9, 2016 at 12:59 PM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Hi,
> >
> > Would like to check, is it possible to return certain fields from
> parent's
> > record, when we are searching for fields that are only contained in child
> > records.
> >
> > For example, for this query:
> > http://localhost:8983/solr/collection1/select?q=childField:123
> >
> > The child field can only be found in child, so this search will not
> return
> > any parent's record. But if I want to return like says its parent's ID
> > together with the above query, is it possible to be done?
> >
> > I'm using Solr 6.1.0.
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Getting dynamic fields using LukeRequest.

2016-08-09 Thread Pranaya Behera


Hi,
 I have the following script to retrieve all the fields in the 
collection. I am using SolrCloud 6.1.0.

LukeRequest lukeRequest = new LukeRequest();
lukeRequest.setNumTerms(0);
lukeRequest.setShowSchema(false);
LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
Map fieldInfoMap = 
lukeResponse.getFieldInfo();
for (Map.Entry entry : 
fieldInfoMap.entrySet()) {
  entry.getKey(); // Here fieldInfoMap is size of 0 for sometime and 
sometime it is getting incomplete data.

}


Setting showSchema to true doesn't yield any result. Only making it 
false yields result that too incomplete data. As I can see in the doc 
that it has more than what it is saying it has.


LukeRequest hits 
/solr/product/admin/luke?numTerms=0=javabin=2 HTTP/1.1 .


How it should be configured for solrcloud ?
I have already mentioned



in the solrconfig.xml. It doesn't matter whether it is present in the 
solrconfig or not as I am requesting it from solrj.

Re: commit it taking 1300 ms

2016-08-09 Thread Midas A

please reply it is urgent.

On Tue, Aug 9, 2016 at 11:17 AM, Midas A  wrote:

> Hi ,
>
> commit is taking more than 1300 ms . what should i check on server.
>
> below is my configuration .
>
>  ${solr.autoCommit.maxTime:15000} <
> openSearcher>false   
> ${solr.autoSoftCommit.maxTime:-1} 
>
>

Re: Returning parent's field which searching for child

2016-08-09 Thread Mikhail Khludnev

Hello Edwin,
Have you tried to combine q={!parent ..}... =*,[child ...] ?

On Tue, Aug 9, 2016 at 12:59 PM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Would like to check, is it possible to return certain fields from parent's
> record, when we are searching for fields that are only contained in child
> records.
>
> For example, for this query:
> http://localhost:8983/solr/collection1/select?q=childField:123
>
> The child field can only be found in child, so this search will not return
> any parent's record. But if I want to return like says its parent's ID
> together with the above query, is it possible to be done?
>
> I'm using Solr 6.1.0.
>
> Regards,
> Edwin
>



-- 
Sincerely yours
Mikhail Khludnev

Custom SearchHandler with custom QueryResponseWriter

2016-08-09 Thread Markus Boese

Hello everyone,
I'm trying use my QueryResponseWriter with my SearchHandler, but the
write(...)-Method of my QueryResponseWriter is not called.

Excerpt of my sorlconfig.xml:







wwWriter
ww_search_key
10


wwSearcher



Could anyone explain why a request to "/ww" not includes a call of
WWResponseWriter?
I just want to render custom Json aus output.

-- 
Greetz,

Markus Boese

Returning parent's field which searching for child

2016-08-09 Thread Zheng Lin Edwin Yeo

Hi,

Would like to check, is it possible to return certain fields from parent's
record, when we are searching for fields that are only contained in child
records.

For example, for this query:
http://localhost:8983/solr/collection1/select?q=childField:123

The child field can only be found in child, so this search will not return
any parent's record. But if I want to return like says its parent's ID
together with the above query, is it possible to be done?

I'm using Solr 6.1.0.

Regards,
Edwin

Antw: Backup And Restore

2016-08-09 Thread Rainer Gnan

Hi Hardika,

the ASG says that a "shared drive" is necessary.
Probably you have to install a NFS server. 

Best regards,
Rainer



Rainer Gnan
Bayerische Staatsbibliothek 
BibliotheksVerbund Bayern
Verbundnahe Dienste
80539 München
Tel.: +49(0)89/28638-2445
Fax: +49(0)89/28638-2665
E-Mail: rainer.g...@bsb-muenchen.de




>>> Hardika Catur S  09.08.2016 09:40 >>>
Hi,

I will create a snapshot on solr backup and restore, but the process 
error and finding errors like this :

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Unable to create snapshot directory: 
/root/backup/snapshot.20160809065729143

Please help me to find a solution on.

Thanks,
Hardika CS.

Unique key field type in solr 6.1 schema

2016-08-09 Thread bharath.mvkumar

Hi All,

I have an issue with cross data center replication, when we delete the
document by id from the main site. The target site document is not deleted.
I have the id field which is a unique field for my schema which is
configured as "long". 

If i change the type to "string" it works fine. Is there any issue using
long. Because we migrated from 4.4 to 6.1, and we had the id field as long.
Can you please help me with this. Really appreciate your help.

I see the below error on the target site:-

 o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Invalid
Number:
  at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:537)
at
org.apache.solr.update.DeleteUpdateCommand.getIndexedId(DeleteUpdateCommand.java:65)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1495)
at
org.apache.solr.update.processor.CdcrUpdateProcessor.versionDelete(CdcrUpdateProcessor.java:85)

Thanks,
Bharath Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-field-type-in-solr-6-1-schema-tp4290895.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr CDCR delete document issue on target site

2016-08-09 Thread bharath.mvkumar

Hi All,

I am using the cdcr solution available in SOLR 6.1 and i have setup the
cross data center replication on both the sites. When i add and update
documents on the main site, the data is replicated to the target site with
no issues. But when i delete a document on the main site, i see the below
errors. However on the main site SOLR node, that document gets deleted, but
on the target site we get an error while deleting that index.

Error stacktrace on main site SOLR node:-

org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from
server at http://:port_number/solr/collection: Invalid Number: 
^A^@^@^@^@^@^@C$U
at
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:697)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1109)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:998)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:934)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at
org.apache.solr.handler.CdcrReplicator.sendRequest(CdcrReplicator.java:135)
at
org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:99)
at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$59(CdcrReplicatorScheduler.java:80)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Error stacktrace on the target site SOLR node leader:-

2016-08-06 08:09:21.091 ERROR (qtp472654579-2699) [c:collection s:shard1
r:core_node3 x:collection] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Invalid Number:  ^A^@^@^@^@^@^L^K0W
at
org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:537)
at
org.apache.solr.update.DeleteUpdateCommand.getIndexedId(DeleteUpdateCommand.java:65)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1495)
at
org.apache.solr.update.processor.CdcrUpdateProcessor.versionDelete(CdcrUpdateProcessor.java:85)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:1154)
at
org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:151)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:112)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at

How to re-index SOLR data

2016-08-09 Thread bharath.mvkumar

Hi All,

What would be the best way to re-index the data in the SOLR cloud? We have
around 65 million data and we are planning to change the schema by changing
the unique key type from long to string.

How long does it take to re-index 65 million documents in SOLR and can you
please suggest how to do that?

Thanks,
Bharath Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-re-index-SOLR-data-tp4290893.html
Sent from the Solr - User mailing list archive at Nabble.com.

Backup And Restore

2016-08-09 Thread Hardika Catur S


Hi,

I will create a snapshot on solr backup and restore, but the process 
error and finding errors like this :


org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Unable to create snapshot directory: 
/root/backup/snapshot.20160809065729143


Please help me to find a solution on.

Thanks,
Hardika CS.

51 matches

Mail list logo