curl post taking time to solr server

2016-08-10 Thread Midas A
Hi ,

we are indexing to 2 core say core1 and core2 with help of curl post .
when we post core1 is taking very less time than core2.

while doc size is same in both server .

it causes core2 indexing very slow . the only difference is core2 has heavy
indexing rate. we indexing more docs/sec on core2


What could be the reason.
there is core ime of curl post on solr server.
How can i minimize the t


Corrupt Index Exception

2016-08-10 Thread Hardika Catur S

Hi,

I find this error when solr index. After this error appear, all 
collection cannot index to solr.


org.apache.lucene.index.CorruptIndexException: checksum failed (hardware 
problem?) : expected=7a1d93c3 actual=c44ec423 
(resource=BufferedChecksumIndexInput(RAMInputStream(name=_27zi.tvd)))


1. what caused the error appears??
2. how to solve this error??

Please help me to find a solution on.

Thanks,
Hardika CS.


Re: How to re-index SOLR data

2016-08-10 Thread John Bickerstaff
Right...  SOLR doesn't work quite that way...

Keep in mind the value of the data import jar if you have the data from
MySQL stored in a text file, although that would require a little
programming to get the data into the proper format..

But once you get everything into a text file or similar, you don't have to
task your MySQL database when you want to reindex  Unless your data
changes frequently... in which case you'll probably have to hit MySQL every
time.

Good luck!

On Aug 10, 2016 6:24 PM, "Bharath Kumar"  wrote:

> Hi All,
>
> Thanks so much for your inputs. We have a MYSQL data source and i think we
> will try to re-index using the MYSQL data.
>
> I wanted something where i can export all my current data say to an excel
> file or some data source and then import it on another node with the same
> collection with empty data.
>
> On Tue, Aug 9, 2016 at 8:44 PM, Erick Erickson 
> wrote:
>
> > Assuming you can re-index
> >
> > Consider "collection aliasing". Say your current collection is C1.
> > Create C2 (using the same cluster, Zookeeper and the like). Go
> > ahead and index to C2 (however you do that). NOTE: the physical
> > machines may be _different_ than C1, or not. That's up to you. The
> > critical bit is that you use the same Zookeeper.
> >
> > Now, when you are done you use the Collections API CREATEALIAS
> > command to point a "pseudo collection" to C1 (call it "prod"). This is
> > seamless to the users.
> >
> > The flaw in my plan so far is that you probably go at Collection C1
> > directly. So what you might do is create the "prod" alias and point it at
> > C1. Now change your LB (or client or whatever) to use the "prod"
> > collection,
> > then when indexing is complete use CREATEALIAS to point "prod" at C2
> > instead.
> >
> > This is actually a quite well-tested process, often used when you want to
> > change "atomically", e.g. when you reindex the same data nightly but want
> > all the new data available in its entirety only after it has been QA'd or
> > such.
> >
> > Best,
> > Erick
> >
> > On Tue, Aug 9, 2016 at 2:43 PM, John Bickerstaff
> >  wrote:
> > > In my case, I've done two things  neither of them involved taking
> the
> > > data from SOLR to SOLR...  although in my reading, I've seen that this
> is
> > > theoretically possible (I.E. sending data from one SOLR server to
> another
> > > SOLR server and  having the second SOLR instance re-index...)
> > >
> > > I haven't used the python script...  that was news to me, but it sounds
> > > interesting...
> > >
> > > What I've done is one of the following:
> > >
> > > a. Get the data from the original source (database, whatever) and
> massage
> > > it again so that i's ready for SOLR and then submit it to my new
> > SolrCloud
> > > for indexing.
> > >
> > > b. Keep a separate store of EVERY Solr document as it comes out of my
> > code
> > > (in xml) and store it in Kafka or a text file.  Then it's easy to push
> > back
> > > into another SOLR instance any time - multiple times if necessary.
> > >
> > > I'm guessing you don't have the data stored away as in "b"...  And if
> you
> > > don't have a way of getting the data from some central source, then "a"
> > > won't work either...  Which leaves you with the concept of sending data
> > > from SOLR "A" to SOLR "B" and having "B" reindex...
> > >
> > > This might serve as a starting point in that case...
> > > https://wiki.apache.org/solr/HowToReindex
> > >
> > > You'll note that there are limitations and a strong caveat against
> doing
> > > this with SOLR, but if you have no other option, then it's the best you
> > can
> > > do.
> > >
> > > Do you have the ability to get all the data again from an authoritative
> > > source?  (Relational Database or something similar?)
> > >
> > > On Tue, Aug 9, 2016 at 3:21 PM, Bharath Kumar <
> bharath.mvku...@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi John,
> > >>
> > >> Thanks so much for your inputs. We have time to build another system.
> So
> > >> how did you index the same data on the main SOLR node to the new SOLR
> > node?
> > >> Did you use the re-index python script? The new data will be indexed
> > >> correctly with the new rules, but what about the old data?
> > >>
> > >> Our SOLR data is around 30GB with around 60 million documents. We use
> > SOLR
> > >> cloud with 3 solr nodes and 3 zookeepers.
> > >>
> > >> On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > >> >
> > >> wrote:
> > >>
> > >> > In case this helps...
> > >> >
> > >> > Assuming you have the resources to build a copy of your production
> > >> > environment and assuming you have the time, you don't need to take
> > your
> > >> > production down - or even affect it's processing...
> > >> >
> > >> > What I've done (with admittedly smaller data sets) is build a
> separate
> > >> > environment (usually on VM's) and once it's set up, I do the new
> > 

Re: How to re-index SOLR data

2016-08-10 Thread Bharath Kumar
Hi All,

Thanks so much for your inputs. We have a MYSQL data source and i think we
will try to re-index using the MYSQL data.

I wanted something where i can export all my current data say to an excel
file or some data source and then import it on another node with the same
collection with empty data.

On Tue, Aug 9, 2016 at 8:44 PM, Erick Erickson 
wrote:

> Assuming you can re-index
>
> Consider "collection aliasing". Say your current collection is C1.
> Create C2 (using the same cluster, Zookeeper and the like). Go
> ahead and index to C2 (however you do that). NOTE: the physical
> machines may be _different_ than C1, or not. That's up to you. The
> critical bit is that you use the same Zookeeper.
>
> Now, when you are done you use the Collections API CREATEALIAS
> command to point a "pseudo collection" to C1 (call it "prod"). This is
> seamless to the users.
>
> The flaw in my plan so far is that you probably go at Collection C1
> directly. So what you might do is create the "prod" alias and point it at
> C1. Now change your LB (or client or whatever) to use the "prod"
> collection,
> then when indexing is complete use CREATEALIAS to point "prod" at C2
> instead.
>
> This is actually a quite well-tested process, often used when you want to
> change "atomically", e.g. when you reindex the same data nightly but want
> all the new data available in its entirety only after it has been QA'd or
> such.
>
> Best,
> Erick
>
> On Tue, Aug 9, 2016 at 2:43 PM, John Bickerstaff
>  wrote:
> > In my case, I've done two things  neither of them involved taking the
> > data from SOLR to SOLR...  although in my reading, I've seen that this is
> > theoretically possible (I.E. sending data from one SOLR server to another
> > SOLR server and  having the second SOLR instance re-index...)
> >
> > I haven't used the python script...  that was news to me, but it sounds
> > interesting...
> >
> > What I've done is one of the following:
> >
> > a. Get the data from the original source (database, whatever) and massage
> > it again so that i's ready for SOLR and then submit it to my new
> SolrCloud
> > for indexing.
> >
> > b. Keep a separate store of EVERY Solr document as it comes out of my
> code
> > (in xml) and store it in Kafka or a text file.  Then it's easy to push
> back
> > into another SOLR instance any time - multiple times if necessary.
> >
> > I'm guessing you don't have the data stored away as in "b"...  And if you
> > don't have a way of getting the data from some central source, then "a"
> > won't work either...  Which leaves you with the concept of sending data
> > from SOLR "A" to SOLR "B" and having "B" reindex...
> >
> > This might serve as a starting point in that case...
> > https://wiki.apache.org/solr/HowToReindex
> >
> > You'll note that there are limitations and a strong caveat against doing
> > this with SOLR, but if you have no other option, then it's the best you
> can
> > do.
> >
> > Do you have the ability to get all the data again from an authoritative
> > source?  (Relational Database or something similar?)
> >
> > On Tue, Aug 9, 2016 at 3:21 PM, Bharath Kumar  >
> > wrote:
> >
> >> Hi John,
> >>
> >> Thanks so much for your inputs. We have time to build another system. So
> >> how did you index the same data on the main SOLR node to the new SOLR
> node?
> >> Did you use the re-index python script? The new data will be indexed
> >> correctly with the new rules, but what about the old data?
> >>
> >> Our SOLR data is around 30GB with around 60 million documents. We use
> SOLR
> >> cloud with 3 solr nodes and 3 zookeepers.
> >>
> >> On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> >
> >> wrote:
> >>
> >> > In case this helps...
> >> >
> >> > Assuming you have the resources to build a copy of your production
> >> > environment and assuming you have the time, you don't need to take
> your
> >> > production down - or even affect it's processing...
> >> >
> >> > What I've done (with admittedly smaller data sets) is build a separate
> >> > environment (usually on VM's) and once it's set up, I do the new
> indexing
> >> > according to the new "rules"  (Like your change of long to string)
> >> >
> >> > Then, in a sense, I don't care how long it takes because it is not
> >> > affecting Prod.
> >> >
> >> > When it's done, I simply switch my load balancer to point to the new
> >> > environment and shut down the old one.
> >> >
> >> > To users, this could be seamless if you handle the load balancer
> >> correctly
> >> > and have it refuse new connections to the old servers while routing
> all
> >> new
> >> > connections to the new Solr servers...
> >> >
> >> > On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar <
> bharath.mvku...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > Hi Nick and Shawn,
> >> > >
> >> > > Thanks so much for the pointers. I will try that out. Thank you
> again!
> >> > >
> 

Re: AnalyticsQuery fails on a sharded collection

2016-08-10 Thread tedsolr
Quick update: the NPE was related to the way in which I passed params into
the Query via solrconfig.xml. It works fine for single sharded, but
something about it was masking the unique ID field in a multisharded
environment. Anyway, I was able to fix that by cleaning up the request
handler config:



{!AggregationPostFilter count=Count
spend=INVOICE_AMOUNT}
[AggregationStats]

   

Now my post filter completes without errors (!) but it doesn't work - it
returns every single document specified by the query (q) param. It isn't
aggregating. (Broken record) It still works correctly on a single shard
collection. With this query, it should do exactly what the collapsing filter
does (and yes, that works perfectly):

.../aggr?q=*:*=VENDOR_NAME=VENDOR_NAME+asc



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291190.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AnalyticsQuery fails on a sharded collection

2016-08-10 Thread tedsolr
I still haven't found the reason for the NPE in my post filter when it runs
against a sharded collection, so I'm posting my code in the hopes that a
seasoned Solr pro might notice something. I thought perhaps not treating the
doc values as multi doc values when indexes are segmented might have been
the issue. But I optimized my test collection to merge the segments and the
search fails in the same spot

ERROR - 2016-08-10 09:03:20.249; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.common.SolrException;
null:java.lang.NullPointerException
at
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1305)
at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:758)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:729)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:388)


public class DocumentCollapsingCollector extends DelegatingCollector {
static final String AGGR_STATS = "AggregationStats";
static final String SORT_BY_SCORE = "SortByScore";
private static final String TOTAL_DOCS_STAT = "totalDocCount";
private final SolrQueryRequest req;
private final ResponseBuilder rb;
private final LeafReaderContext[] contexts;
private final FixedBitSet collapsedSet;
private final List fieldValues;
private final NumericDocValues spendValues;
private final Map aggregatedDocs;
private int docBase;
private final int maxDoc;
private final int numberOfFields;
private int totalDocs;
private final SearchPreProcessor.SortBy sortBy;

DocumentCollapsingCollector(int maxDoc, int segments, 
List
docValues, NumericDocValues spendValues,
SolrQueryRequest req, ResponseBuilder rb) {

aggregatedDocs = new HashMap<>();
this.maxDoc = maxDoc;
contexts = new LeafReaderContext[segments];
collapsedSet = new FixedBitSet(maxDoc);
fieldValues = docValues;
numberOfFields = docValues.size();
this.spendValues = spendValues;
this.req = req;
this.rb = rb;
sortBy = (SearchPreProcessor.SortBy) 
req.getContext().get(SORT_BY_SCORE);
}

@Override
public void collect(int doc) throws IOException {
int globalDoc = doc + docBase;
int[] ords = new int[numberOfFields];

int i=0;
for (SortedDocValues vals : fieldValues) {
ords[i++] = vals.getOrd(globalDoc);
}

FieldOrdinals ordinals = new FieldOrdinals(ords);
AggregationStats stats = aggregatedDocs.get(ordinals);
if (stats != null) {
stats.bumpCount();

stats.addSpend(Double.longBitsToDouble(spendValues.get(globalDoc)));
} else {
aggregatedDocs.put(ordinals, new 
AggregationStats(globalDoc,
Double.longBitsToDouble(spendValues.get(globalDoc;
}
totalDocs++;
}

@Override
public boolean needsScores() {
return sortBy != null;
}

@Override
protected void doSetNextReader(LeafReaderContext context) throws
IOException {
contexts[context.ord] = context;
docBase = context.docBase;
}

@Override
public void finish() throws IOException {
if (contexts.length == 0) {
return;
}

for (AggregationStats docStats : aggregatedDocs.values()) {
collapsedSet.set(docStats.getDocId());
}

// saving the stats to the request context so that a doc 
transformer can
pick them up
AggregationStatsArray stats = new
AggregationStatsArray(aggregatedDocs.values());
ImmutableSparseArray statsArray = new
ImmutableSparseArray(stats);
req.getContext().put(AGGR_STATS, statsArray);

int currentContext = 0;
int currentDocBase = 0;
int nextDocBase = currentContext+1 < contexts.length ?
contexts[currentContext+1].docBase : maxDoc;

super.leafDelegate =
super.delegate.getLeafCollector(contexts[currentContext]);
DummyScorer dummy = new DummyScorer();
super.leafDelegate.setScorer(dummy);

BitSetIterator it = new BitSetIterator(collapsedSet, 0L);
int docId = -1;

while ((docId = it.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) 
{
if 

RE: Typical Error with live SOLR nodes

2016-08-10 Thread Ritesh Kumar (Avanade)
Hi All,

Please provide some help in this issue.

Ritesh K
Infrastructure Sr. Engineer - Jericho Team
Sales & Marketing Digital Services
t +91-7799936921   v-kur...@microsoft.com


-Original Message-
From: Ritesh Kumar (Avanade) [mailto:v-kur...@microsoft.com] 
Sent: 10 August 2016 15:38
To: solr-user 
Subject: RE: Typical Error with live SOLR nodes

Hi All,

We have 3 ZK VM's and 3 Solr VM's with SOLR 6 and we have implemented CDCR. 
(windows) A dedicated drive has been setup for SOLR & ZK separately.

The issue we are facing is 2 nodes are showing together and 1 node separately 
in the same external zookeeper. Please note that restarting ZK windows services 
or restarting VM's temporarily fixes the issue.

Please find the solr status details from 2 SOLR VM's:

F:\solr-6.1.0\bin>solr status

Found Solr process 3612 running on port 8984 {
  "solr_home":"F:\\solr-6.1.0\\server\\solr",
  "version":"6.1.0 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 - jpountz - 2016-06-
13 09:46:58",
  "startTime":"2016-08-03T17:50:08.928Z",
  "uptime":"0 days, 0 hours, 9 minutes, 3 seconds",
  "memory":"150.2 MB (%3.8) of 3.8 GB",
  "cloud":{
"ZooKeeper":"ZKHostIP:2181, ZKHostIP:2182, ZKHostIP:2183",
"liveNodes":"2",
"collections":"0"}}


F:\solr-6.1.0\bin>solr status

Found Solr process 5100 running on port 8983 {
  "solr_home":"F:\\solr-6.1.0\\server\\solr",
  "version":"6.1.0 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 - jpountz - 2016-06-
13 09:46:58",
  "startTime":"2016-08-03T17:49:28.868Z",
  "uptime":"0 days, 0 hours, 5 minutes, 9 seconds",
  "memory":"156.9 MB (%4) of 3.8 GB",
  "cloud":{
"ZooKeeper":" ZKHostIP:2181, ZKHostIP:2182, ZKHostIP:2183",
"liveNodes":"1",
"collections":"0"}}

[cid:image001.png@01D1213F.8C2C3A20]

Ritesh K
Infrastructure Sr. Engineer - Jericho Team Sales & Marketing Digital Services
t +91-7799936921   v-kur...@microsoft.com





Re: Solr 6.1 :: language specific analysis

2016-08-10 Thread Susheel Kumar
BeiderMorse supports these phonetics variations like Foto / Photo and have
support for many languages including German.  Please see
https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching

Thanks,
Susheel

On Wed, Aug 10, 2016 at 2:47 PM, Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:

> Can you use Solr's synonym feature?  You can find a German synonym file
> here: https://sites.google.com/site/kevinbouge/synonyms-lists
>
> Alexandre Drouin
>
> -Original Message-
> From: Rainer Gnan [mailto:rainer.g...@bsb-muenchen.de]
> Sent: Wednesday, August 10, 2016 10:21 AM
> To: solr-user@lucene.apache.org
> Subject: Solr 6.1 :: language specific analysis
>
> Hello,
>
> I wonder if solr offers a feature (class) to handle different orthogaphy
> versions?
> For the German language for example ... in order to find the same
> documents when searching after "Foto" or "Photo".
>
> I appreachiate any help!
>
> Rainer
>
>
> 
> Rainer Gnan
> Bayerische Staatsbibliothek
> BibliotheksVerbund Bayern
> Verbundnahe Dienste
> 80539 München
> Tel.: +49(0)89/28638-2445
> Fax: +49(0)89/28638-2665
> E-Mail: rainer.g...@bsb-muenchen.de
> 
>
>
>
>


RE: Solr 6.1 :: language specific analysis

2016-08-10 Thread Alexandre Drouin
Can you use Solr's synonym feature?  You can find a German synonym file here: 
https://sites.google.com/site/kevinbouge/synonyms-lists

Alexandre Drouin

-Original Message-
From: Rainer Gnan [mailto:rainer.g...@bsb-muenchen.de] 
Sent: Wednesday, August 10, 2016 10:21 AM
To: solr-user@lucene.apache.org
Subject: Solr 6.1 :: language specific analysis

Hello,

I wonder if solr offers a feature (class) to handle different orthogaphy 
versions?
For the German language for example ... in order to find the same documents 
when searching after "Foto" or "Photo".

I appreachiate any help!

Rainer



Rainer Gnan
Bayerische Staatsbibliothek 
BibliotheksVerbund Bayern
Verbundnahe Dienste
80539 München
Tel.: +49(0)89/28638-2445
Fax: +49(0)89/28638-2665
E-Mail: rainer.g...@bsb-muenchen.de






RE: Solr 6.1 :: language specific analysis

2016-08-10 Thread Allison, Timothy B.
ICU normalization (ICUFoldingFilterFactory) will at least handle "ß" -> "ss" 
(IIRC) and some other language-general variants that might get you close.  
There are, of course, language specific analyzers 
(https://wiki.apache.org/solr/LanguageAnalysis#German) , but I don't think 
they'll get you Foto->photo.  

You might experiment with DoubleMetaphone encoding 
(DoubleMetaphoneFilterFactory) or, worst case, back off to synonym lists 
(SynonymFilterFactory) for your domain.

-Original Message-
From: Rainer Gnan [mailto:rainer.g...@bsb-muenchen.de] 
Sent: Wednesday, August 10, 2016 10:21 AM
To: solr-user@lucene.apache.org
Subject: Solr 6.1 :: language specific analysis

Hello,

I wonder if solr offers a feature (class) to handle different orthogaphy 
versions?
For the German language for example ... in order to find the same documents 
when searching after "Foto" or "Photo".

I appreachiate any help!

Rainer



Rainer Gnan
Bayerische Staatsbibliothek 
BibliotheksVerbund Bayern
Verbundnahe Dienste
80539 München
Tel.: +49(0)89/28638-2445
Fax: +49(0)89/28638-2665
E-Mail: rainer.g...@bsb-muenchen.de






Solr 6.1 :: language specific analysis

2016-08-10 Thread Rainer Gnan
Hello,

I wonder if solr offers a feature (class) to handle different orthogaphy 
versions?
For the German language for example ... in order to find the same documents 
when searching after "Foto" or "Photo".

I appreachiate any help!

Rainer



Rainer Gnan
Bayerische Staatsbibliothek 
BibliotheksVerbund Bayern
Verbundnahe Dienste
80539 München
Tel.: +49(0)89/28638-2445
Fax: +49(0)89/28638-2665
E-Mail: rainer.g...@bsb-muenchen.de






Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
Thanks Alexandre,

I solved the problem using the xslt transform and the /update handler.

I attach the xsl that I put in conf/xslt/ (for documentation)

Then the command:
curl 
"http://192.168.99.100:8999/solr/solrexchange/update?commit=true=updateXmlSolrExchange.xsl;
 -H "Content-Type: text/xml" --data-binary 
@./solr/data/search/dih/data_search.xml

It is a shame that DIH can not be used with the schemaless config. I hope this 
will be possible in the future.

Thanks,
Pierre


> On 10 Aug 2016, at 19:02, Alexandre Rafalovitch  wrote:
> 
> Seem you might be right, according to the source:
> https://github.com/apache/lucene-solr/blob/master/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java#L662
> 
> Sometimes, the magic (and schemaless is rather magical) fails when
> combined with older assumptions (and DIH is kind of legacy).
> 
> You can still declare dynamic fields and use preffix/suffix to map to
> the types. That would work just fine and avoid guessing.
> 
> Or you could use API to predefine the fields in the schema.
> 
> Or use the POST method with XSLT preprocessor (yes, Solr has that too
> somewhere).
> 
> Regards,
>   Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 10 August 2016 at 18:42, Pierre Caserta  wrote:
>> I am rebuilding a new docker image with each change on the config file so 
>> solr starts fresh every time.
>> 
>>  > class="solr.DataImportHandler">
>>  
>>add-unknown-fields-to-the-schema
>>solr-data-config.xml
>>  
>>  
>> 
>> still having document like such:
>> 
>> "response":{"numFound":8,"start":0,"docs":[
>>  {
>>"id":"38822",
>>"_version_":1542264667720646656},
>>  {
>> 
>> If add add the Body field using the Schema section of the Admin UI, This 
>> field is getting indexed during the dataimport.
>> It seems that solr.DataImportHandler does not allow the 
>> add-unknown-fields-to-the-schema update.chain.
>> 
>> Pierre
>> 
>>> On 10 Aug 2016, at 18:33, Alexandre Rafalovitch  wrote:
>>> 
>>> Ok, to reduce the magic, you can just stick "update.chain" parameter
>>> inside the defaults of the dataimport handler directly.
>>> 
>>> You can also pass it just as a URL parameter. That's what 'defaults'
>>> section mean.
>>> 
>>> And, just to be paranoid, you did reload the core after each of those
>>> changes to test it? These are not picked up automatically.
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>>> 
>>> 
>>> On 10 August 2016 at 18:28, Pierre Caserta  wrote:
 It did not work,
 I tried many things and ended up trying this:
 
 >>> class="solr.DataImportHandler">
 
   solr-data-config.xml
 
 
 
   
 add-unknown-fields-to-the-schema
   
 
 
 Regards,
 Pierre
 
> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch  
> wrote:
> 
> Your initParams section does not apply to /dataimport handler as
> defined. Try modifying it to say:
> path="/update/**,/dataimport"
> 
> Hopefully, that's all that takes.
> 
> Managed schema is enabled by default, but schemaless mode is the next
> layer on top. With managed schema, you can use the API to add your
> fields (or new Admin UI in the Schema screen). With schemaless mode,
> it tries to guess the field type as it adds it automatically.
> 
> 
> Regards,
>  Alex.
> 
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 10 August 2016 at 18:04, Pierre Caserta  
> wrote:
>> Hi Alex,
>> thanks for your answer.
>> 
>> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
>> 
>> 
>>  
>>add-unknown-fields-to-the-schema
>>  
>> 
>> 
>> I created my core using this command:
>> 
>> curl 
>> http://192.168.99.100:8999/solr/admin/cores?action=CREATE=solrexchange=/opt/solr/server/solr/solrexchange=data_driven_schema_configs_custom
>> 
>> I am using the example configset data_driven_schema_configs and I simply 
>> added:
>> 
>> > regex="solr-dataimporthandler-.*\.jar" />
>> 
>>
>>  data-config.xml
>>
>> 
>> 
>> I thought the schemaless mode was enable by default but I also tried 
>> adding this config but I get the same result.
>> 
>> 
>>  true
>>  managed-schema
>> 
>> 
>> How can I update my schemaless URP chain and add the parameter to call 
>> it to DIH?
>> 
>> 
>>> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch  
>>> wrote:

Re: Solr and Drupal

2016-08-10 Thread Charlie Hull

On 09/08/2016 18:11, Rose, John B wrote:

We are looking at Solr for a Drupal web site. We have never installed Solr.


From my readings it is not clear exactly what we need to implement a search in 
Drupal with Solr. Some sites have implied Lucene and/or Tomcat are needed.


Can someone point me to the site that explains minimally what is needed to 
implement Solr within Drupal?


Thanks for your time


Hi John,

We have a couple of ongoing projects involving Drupal & Solr. Although 
the latest Drupal Solr modules are OK, we tend to find that the 
Drupal-ish way of thinking doesn't always translate to the best Solr 
search experience: you can for example end up with very complex queries 
being sent to Solr, everything being re-indexed every time a single 
Drupal node is updated (we're actually working on a solution for this 
for very large collections) or some very odd defaults being set in the 
Solr configuration files. This is a generic issue when Solr or another 
search engine is embedded in another product - the people doing the 
embedding may not know enough about search to do it right.


In any case, you'll probably be fine, but do be aware.

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Help for -- Filter in the text field + highlight + no affect on boosting(if done with q instead of fq)

2016-08-10 Thread Emir Arnautovic

Hi Mayur,

Not sure if I get your case completely, but if you need query but not 
sorted by score, you can use boost factors 0 in your edismax definition 
(e.g. qf=title^0) or you can order by doc id (sort= _docid_ asc)


HTH,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On 09.08.2016 19:10, Raleraskar, Mayur wrote:

Hi All,
I am using Solr for search functionality here @ eBay reviews team.

I need to implement search functionality with q parameter but do not want it, 
to affect boosting or relevancy. How can I achieve that? Effectively I want it 
perform just like a filter.
My query is like
SolrIp:Port/select?defType=edismax=text%3Agood=max%28relevanceScore_dx%2C+0.1%29=recip%28abs%28ms%28NOW%2FYEAR%2B1YEAR%2ClastEditedDate_dt%29%29%2C+3.16e-11%2C1%2C1%29=0=5=true=%7B%21ex%3Dlab_ix%2Clocale_sx%7Drating_ix=%7B%21ex%3Dlab_ix%7Dlabel_ix=count=0=100=0=status_ix%3A1=%7B%21tag%3Dlab_ix%7Dlabel_ix%3A2=siteId_ix%3A0=subjectReferenceId_lx%3A1040409165+AND+subjectType_sx%3AP=json=true=id=true

OR
I can search/filter with fq parameter but I need to highlight words which are 
filtered by fq. Just the words in the text, which matches fq regexnot the 
entire text field.
My query is like
SolrIp:Port/select?defType=edismax=*%3A*=max%28relevanceScore_dx%2C+0.1%29=recip%28abs%28ms%28NOW%2FYEAR%2B1YEAR%2ClastEditedDate_dt%29%29%2C+3.16e-11%2C1%2C1%29=0=50=true=%7B%21ex%3Dlab_ix%2Clocale_sx%7Drating_ix=%7B%21ex%3Dlab_ix%7Dlabel_ix=count=0=100=0=status_ix%3A1=%7B%21tag%3Dlab_ix%7Dlabel_ix%3A2=siteId_ix%3A0=subjectReferenceId_lx%3A1040409165+AND+subjectType_sx%3AP=text%3Agood=json=true=id


Thanks in advance,
Mayur






Re: display filter based on existence of facet

2016-08-10 Thread Emir Arnautovic

Hi Derek,

Not sure if there is some shortcut but you could try setting 
facet.sort=index and for sure use facet.limit=1.


Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 10.08.2016 09:32, Derek Poh wrote:
I have a couple of filtersthat is text input based, where user will 
input a value into the text boxes of these filters.
The condition is these filters will only be display if the facets 
exists in the search result.
Eg. Min Order Qty filter will be displayif theMin Order Qty facet 
exists in thesolr result.


To display this filter, I only need to'know' there is value to filter on.
Currentlyall the possible terms and counts of the Min Order Qty field 
is return for this facet.


Any suggestions on how I can avoid the computation of the possible 
terms and their countsfor the facet fieldand hence reduce the 
computational time of the query?

I just need to know there is'a value to filter on'.

This is the parameters of the query that is use to display the list of 
filters.
group.field=P_SupplierId=true=true=0=0=coffee=P_SupplierSource:(1)=true=1=P_CNState=P_BusinessType=P_CombinedBusTypeFlat=P_CombinedCompCertFlat=P_CombinedExportCountryFlat=P_CombinedProdCertFlat=P_Country=P_CSFParticipant=P_FOBPriceMinFlag=P_FOBPriceMaxFlag=P_HasAuditInfo=P_HasCreditInfo=P_LeadTime=P_Microsite=P_MinOrderQty=P_MonthlyCapacityFlag=P_OEMServices=P_PSEParticipant=P_SupplierRanking=P_SupplierUpcomingTradeShow=P_YearsInBusiness=P_SmallOrderFlag 



Using solr 4.10.4

Thankyou,
Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential 
and/or privileged information. If you are not the intended recipient 
or have received this e-mail in error, please inform the sender 
immediately and delete this e-mail (including any attachments) from 
your computer, and you must not use, disclose to anyone else or copy 
this e-mail (including any attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.


Re: Solr 5.2.1 heap issues

2016-08-10 Thread Emir Arnautovic

Hi Preeti,

3GB heap is too small for such setup. I would try 10-15GB, but that 
depends on usage patterns. You have 50GB machine and assuming that you 
do not run anything other than solr you have 30GB to spare on Solr and 
still leave enough to OS to cache entire index.


The best way to do heap tuning is to monitor it. You can use standard 
java tools, but if you prefer to get more insight into Solr/OS behavior, 
you should use proper monitoring solution. There are several cloud 
solutions that you can use even for free on small Solr setups. One such 
product is our SPM .


HTH,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On 10.08.2016 12:41, preeti kumari wrote:

Hi,

I am using solr 5.2.1 in cloud mode with 3 shards on 3 different servers.

Each server is having 20 GB of data size . Total memory on each server is
around 50 GB.
Continuos updates and queries are being fired to solr.
We have been facing OOM issues due to heap issues.

args we use: giving 3 GB of max heap space on each solr server

java -server -Xss256k* -Xms3g -Xmx3g* -XX:NewRatio=3 -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
*-XX:ParallelGCThreads=4* -XX:+CMSScavengeBeforeRemark
-XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled
-XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 -verbose:gc
-XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime 

Please do let me know if these parameters values are enough for solr
running without OOM.
Let me know how can I fix these OOM issues.

Thanks
Preeti





Solr 5.2.1 heap issues

2016-08-10 Thread preeti kumari
Hi,

I am using solr 5.2.1 in cloud mode with 3 shards on 3 different servers.

Each server is having 20 GB of data size . Total memory on each server is
around 50 GB.
Continuos updates and queries are being fired to solr.
We have been facing OOM issues due to heap issues.

args we use: giving 3 GB of max heap space on each solr server

java -server -Xss256k* -Xms3g -Xmx3g* -XX:NewRatio=3 -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
*-XX:ParallelGCThreads=4* -XX:+CMSScavengeBeforeRemark
-XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled
-XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 -verbose:gc
-XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime 

Please do let me know if these parameters values are enough for solr
running without OOM.
Let me know how can I fix these OOM issues.

Thanks
Preeti


RE: Typical Error with live SOLR nodes

2016-08-10 Thread Ritesh Kumar (Avanade)
Hi All,

We have 3 ZK VM's and 3 Solr VM's with SOLR 6 and we have implemented CDCR. 
(windows) A dedicated drive has been setup for SOLR & ZK separately.

The issue we are facing is 2 nodes are showing together and 1 node separately 
in the same external zookeeper. Please note that restarting ZK windows services 
or restarting VM's temporarily fixes the issue.

Please find the solr status details from 2 SOLR VM's:

F:\solr-6.1.0\bin>solr status

Found Solr process 3612 running on port 8984
{
  "solr_home":"F:\\solr-6.1.0\\server\\solr",
  "version":"6.1.0 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 - jpountz - 2016-06-
13 09:46:58",
  "startTime":"2016-08-03T17:50:08.928Z",
  "uptime":"0 days, 0 hours, 9 minutes, 3 seconds",
  "memory":"150.2 MB (%3.8) of 3.8 GB",
  "cloud":{
"ZooKeeper":"ZKHostIP:2181, ZKHostIP:2182, ZKHostIP:2183",
"liveNodes":"2",
"collections":"0"}}


F:\solr-6.1.0\bin>solr status

Found Solr process 5100 running on port 8983
{
  "solr_home":"F:\\solr-6.1.0\\server\\solr",
  "version":"6.1.0 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 - jpountz - 2016-06-
13 09:46:58",
  "startTime":"2016-08-03T17:49:28.868Z",
  "uptime":"0 days, 0 hours, 5 minutes, 9 seconds",
  "memory":"156.9 MB (%4) of 3.8 GB",
  "cloud":{
"ZooKeeper":" ZKHostIP:2181, ZKHostIP:2182, ZKHostIP:2183",
"liveNodes":"1",
"collections":"0"}}

[cid:image001.png@01D1213F.8C2C3A20]

Ritesh K
Infrastructure Sr. Engineer - Jericho Team
Sales & Marketing Digital Services
t +91-7799936921   v-kur...@microsoft.com





Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Seem you might be right, according to the source:
https://github.com/apache/lucene-solr/blob/master/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java#L662

Sometimes, the magic (and schemaless is rather magical) fails when
combined with older assumptions (and DIH is kind of legacy).

You can still declare dynamic fields and use preffix/suffix to map to
the types. That would work just fine and avoid guessing.

Or you could use API to predefine the fields in the schema.

Or use the POST method with XSLT preprocessor (yes, Solr has that too
somewhere).

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 August 2016 at 18:42, Pierre Caserta  wrote:
> I am rebuilding a new docker image with each change on the config file so 
> solr starts fresh every time.
>
>class="solr.DataImportHandler">
>   
> add-unknown-fields-to-the-schema
> solr-data-config.xml
>   
>   
>
> still having document like such:
>
> "response":{"numFound":8,"start":0,"docs":[
>   {
> "id":"38822",
> "_version_":1542264667720646656},
>   {
>
> If add add the Body field using the Schema section of the Admin UI, This 
> field is getting indexed during the dataimport.
> It seems that solr.DataImportHandler does not allow the 
> add-unknown-fields-to-the-schema update.chain.
>
> Pierre
>
>> On 10 Aug 2016, at 18:33, Alexandre Rafalovitch  wrote:
>>
>> Ok, to reduce the magic, you can just stick "update.chain" parameter
>> inside the defaults of the dataimport handler directly.
>>
>> You can also pass it just as a URL parameter. That's what 'defaults'
>> section mean.
>>
>> And, just to be paranoid, you did reload the core after each of those
>> changes to test it? These are not picked up automatically.
>>
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 10 August 2016 at 18:28, Pierre Caserta  wrote:
>>> It did not work,
>>> I tried many things and ended up trying this:
>>>
>>>  >> class="solr.DataImportHandler">
>>>  
>>>solr-data-config.xml
>>>  
>>>  
>>>  
>>>
>>>  add-unknown-fields-to-the-schema
>>>
>>>  
>>>
>>> Regards,
>>> Pierre
>>>
 On 10 Aug 2016, at 18:08, Alexandre Rafalovitch  wrote:

 Your initParams section does not apply to /dataimport handler as
 defined. Try modifying it to say:
 path="/update/**,/dataimport"

 Hopefully, that's all that takes.

 Managed schema is enabled by default, but schemaless mode is the next
 layer on top. With managed schema, you can use the API to add your
 fields (or new Admin UI in the Schema screen). With schemaless mode,
 it tries to guess the field type as it adds it automatically.


 Regards,
   Alex.

 
 Newsletter and resources for Solr beginners and intermediates:
 http://www.solr-start.com/


 On 10 August 2016 at 18:04, Pierre Caserta  
 wrote:
> Hi Alex,
> thanks for your answer.
>
> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
>
> 
>   
> add-unknown-fields-to-the-schema
>   
> 
>
> I created my core using this command:
>
> curl 
> http://192.168.99.100:8999/solr/admin/cores?action=CREATE=solrexchange=/opt/solr/server/solr/solrexchange=data_driven_schema_configs_custom
>
> I am using the example configset data_driven_schema_configs and I simply 
> added:
>
>  regex="solr-dataimporthandler-.*\.jar" />
> 
> 
>   data-config.xml
> 
> 
>
> I thought the schemaless mode was enable by default but I also tried 
> adding this config but I get the same result.
>
> 
>   true
>   managed-schema
> 
>
> How can I update my schemaless URP chain and add the parameter to call it 
> to DIH?
>
>
>> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch  
>> wrote:
>>
>> Do you have the actual fields defined? If not, then I am guessing that
>> your 'post' test was against a different collection that had
>> schemaless mode enabled and your DIH one is against one where
>> schemaless mode is not enabled (look for
>> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
>> Solr examples for DIH do not have schemaless mode enabled.
>>
>> I _believe_ you can copy the schemaless URP chain and add the
>> parameter to call it to DIH handler and it _should_ work. But I am not
>> betting on it without testing it, as DIH also has some magic code to
>> ignore fields not defined in schema because it is designed to work
>> with only extracting 

Re: commit it taking 1300 ms

2016-08-10 Thread Emir Arnautovic

Hi Midas,

According to your autocommit configuration and your worry about commit 
time I assume that you are doing explicit commits from client code and 
that 1.3s is client observed commit time. If that is the case, than it 
might be opening searcher that is taking time.


How do you index data - single threaded or multithreaded? How frequently 
do you commit from client? Can you let Solr do soft commits instead of 
explicitly committing? Do you have warmup queries? Is this SolrCloud? 
What is number of servers (what spec), shards, docs?


In any case monitoring can give you more info about server/Solr behavior 
and help you diagnose issues more easily/precisely. One such monitoring 
tool is our SPM .


Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On 10.08.2016 05:20, Midas A wrote:

Thanks for replying

index size:9GB
2000 docs/sec.

Actually earlier it was taking less but suddenly it has increased .

Currently we do not have any monitoring  tool.

On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Midas,

Can you give us more details on your index: size, number of new docs
between commits. Why do you think 1.3s for commit is to much and why do you
need it to take less? Did you do any system/Solr monitoring?

Emir


On 09.08.2016 14:10, Midas A wrote:


please reply it is urgent.

On Tue, Aug 9, 2016 at 11:17 AM, Midas A  wrote:

Hi ,

commit is taking more than 1300 ms . what should i check on server.

below is my configuration .

 ${solr.autoCommit.maxTime:15000} <
openSearcher>false  

${solr.autoSoftCommit.maxTime:-1} 




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
I am rebuilding a new docker image with each change on the config file so solr 
starts fresh every time.

  
  
add-unknown-fields-to-the-schema
solr-data-config.xml
  
  

still having document like such:

"response":{"numFound":8,"start":0,"docs":[
  {
"id":"38822",
"_version_":1542264667720646656},
  {

If add add the Body field using the Schema section of the Admin UI, This field 
is getting indexed during the dataimport.
It seems that solr.DataImportHandler does not allow the 
add-unknown-fields-to-the-schema update.chain.

Pierre

> On 10 Aug 2016, at 18:33, Alexandre Rafalovitch  wrote:
> 
> Ok, to reduce the magic, you can just stick "update.chain" parameter
> inside the defaults of the dataimport handler directly.
> 
> You can also pass it just as a URL parameter. That's what 'defaults'
> section mean.
> 
> And, just to be paranoid, you did reload the core after each of those
> changes to test it? These are not picked up automatically.
> 
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 10 August 2016 at 18:28, Pierre Caserta  wrote:
>> It did not work,
>> I tried many things and ended up trying this:
>> 
>>  > class="solr.DataImportHandler">
>>  
>>solr-data-config.xml
>>  
>>  
>>  
>>
>>  add-unknown-fields-to-the-schema
>>
>>  
>> 
>> Regards,
>> Pierre
>> 
>>> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch  wrote:
>>> 
>>> Your initParams section does not apply to /dataimport handler as
>>> defined. Try modifying it to say:
>>> path="/update/**,/dataimport"
>>> 
>>> Hopefully, that's all that takes.
>>> 
>>> Managed schema is enabled by default, but schemaless mode is the next
>>> layer on top. With managed schema, you can use the API to add your
>>> fields (or new Admin UI in the Schema screen). With schemaless mode,
>>> it tries to guess the field type as it adds it automatically.
>>> 
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> 
>>> Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>>> 
>>> 
>>> On 10 August 2016 at 18:04, Pierre Caserta  wrote:
 Hi Alex,
 thanks for your answer.
 
 Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
 
 
   
 add-unknown-fields-to-the-schema
   
 
 
 I created my core using this command:
 
 curl 
 http://192.168.99.100:8999/solr/admin/cores?action=CREATE=solrexchange=/opt/solr/server/solr/solrexchange=data_driven_schema_configs_custom
 
 I am using the example configset data_driven_schema_configs and I simply 
 added:
 
 >>> regex="solr-dataimporthandler-.*\.jar" />
 
 
   data-config.xml
 
 
 
 I thought the schemaless mode was enable by default but I also tried 
 adding this config but I get the same result.
 
 
   true
   managed-schema
 
 
 How can I update my schemaless URP chain and add the parameter to call it 
 to DIH?
 
 
> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch  
> wrote:
> 
> Do you have the actual fields defined? If not, then I am guessing that
> your 'post' test was against a different collection that had
> schemaless mode enabled and your DIH one is against one where
> schemaless mode is not enabled (look for
> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
> Solr examples for DIH do not have schemaless mode enabled.
> 
> I _believe_ you can copy the schemaless URP chain and add the
> parameter to call it to DIH handler and it _should_ work. But I am not
> betting on it without testing it, as DIH also has some magic code to
> ignore fields not defined in schema because it is designed to work
> with only extracting relevant fields from the database even with
> 'select *' statement.
> 
> 
> Regards,
> Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 10 August 2016 at 17:12, Pierre Caserta  
> wrote:
>> Hi,
>> It seems that using the DataImportHandler with a XPathEntityProcessor 
>> config
>> with a managed-schema setup, only import the id and version field.
>> 
>> data-config.xml
>> 
>> 
>>  
>>  
>>  >  processor="XPathEntityProcessor"
>>  stream="true"
>>  forEach="/posts/row/"
>>  url="${dataimporter.request.dataurl}"
>> 
>> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
>>> 
>>  
>>  
>>  > xpath="/posts/row/@AcceptedAnswerId" />
>>  

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Ok, to reduce the magic, you can just stick "update.chain" parameter
inside the defaults of the dataimport handler directly.

You can also pass it just as a URL parameter. That's what 'defaults'
section mean.

And, just to be paranoid, you did reload the core after each of those
changes to test it? These are not picked up automatically.

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 August 2016 at 18:28, Pierre Caserta  wrote:
> It did not work,
> I tried many things and ended up trying this:
>
>class="solr.DataImportHandler">
>   
> solr-data-config.xml
>   
>   
>   
> 
>   add-unknown-fields-to-the-schema
> 
>   
>
> Regards,
> Pierre
>
>> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch  wrote:
>>
>> Your initParams section does not apply to /dataimport handler as
>> defined. Try modifying it to say:
>> path="/update/**,/dataimport"
>>
>> Hopefully, that's all that takes.
>>
>> Managed schema is enabled by default, but schemaless mode is the next
>> layer on top. With managed schema, you can use the API to add your
>> fields (or new Admin UI in the Schema screen). With schemaless mode,
>> it tries to guess the field type as it adds it automatically.
>>
>>
>> Regards,
>>Alex.
>>
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 10 August 2016 at 18:04, Pierre Caserta  wrote:
>>> Hi Alex,
>>> thanks for your answer.
>>>
>>> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
>>>
>>>  
>>>
>>>  add-unknown-fields-to-the-schema
>>>
>>>  
>>>
>>> I created my core using this command:
>>>
>>> curl 
>>> http://192.168.99.100:8999/solr/admin/cores?action=CREATE=solrexchange=/opt/solr/server/solr/solrexchange=data_driven_schema_configs_custom
>>>
>>> I am using the example configset data_driven_schema_configs and I simply 
>>> added:
>>>
>>>  >> regex="solr-dataimporthandler-.*\.jar" />
>>>  
>>>  
>>>data-config.xml
>>>  
>>>  
>>>
>>> I thought the schemaless mode was enable by default but I also tried adding 
>>> this config but I get the same result.
>>>
>>>  
>>>true
>>>managed-schema
>>>  
>>>
>>> How can I update my schemaless URP chain and add the parameter to call it 
>>> to DIH?
>>>
>>>
 On 10 Aug 2016, at 17:43, Alexandre Rafalovitch  wrote:

 Do you have the actual fields defined? If not, then I am guessing that
 your 'post' test was against a different collection that had
 schemaless mode enabled and your DIH one is against one where
 schemaless mode is not enabled (look for
 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
 Solr examples for DIH do not have schemaless mode enabled.

 I _believe_ you can copy the schemaless URP chain and add the
 parameter to call it to DIH handler and it _should_ work. But I am not
 betting on it without testing it, as DIH also has some magic code to
 ignore fields not defined in schema because it is designed to work
 with only extracting relevant fields from the database even with
 'select *' statement.


 Regards,
  Alex.
 
 Newsletter and resources for Solr beginners and intermediates:
 http://www.solr-start.com/


 On 10 August 2016 at 17:12, Pierre Caserta  
 wrote:
> Hi,
> It seems that using the DataImportHandler with a XPathEntityProcessor 
> config
> with a managed-schema setup, only import the id and version field.
>
> data-config.xml
>
> 
>   
>   
>      processor="XPathEntityProcessor"
>   stream="true"
>   forEach="/posts/row/"
>   url="${dataimporter.request.dataurl}"
>
> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
>>
>   
>   
>    xpath="/posts/row/@AcceptedAnswerId" />
>    dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>   
>   
>    />
>   
>    xpath="/posts/row/@LastEditorUserId" />
>    xpath="/posts/row/@LastEditorDisplayName" />
>    xpath="/posts/row/@LastActivityDate"
> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>   
>    regex="(.*)" />
>    splitBy="" />
>   
>    />
>    />
>    xpath="/posts/row/@CommunityOwnedDate"
> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>   
>   
> 
>
>
> http://192.168.99.100:8999/solr/solrexchange/select?indent=on=*:*=json
> {
> "responseHeader":{
>   "status":0,
>   "QTime":0,
>   

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
It did not work,
I tried many things and ended up trying this:

  
  
solr-data-config.xml
  
  
  

  add-unknown-fields-to-the-schema

  

Regards,
Pierre

> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch  wrote:
> 
> Your initParams section does not apply to /dataimport handler as
> defined. Try modifying it to say:
> path="/update/**,/dataimport"
> 
> Hopefully, that's all that takes.
> 
> Managed schema is enabled by default, but schemaless mode is the next
> layer on top. With managed schema, you can use the API to add your
> fields (or new Admin UI in the Schema screen). With schemaless mode,
> it tries to guess the field type as it adds it automatically.
> 
> 
> Regards,
>Alex.
> 
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 10 August 2016 at 18:04, Pierre Caserta  wrote:
>> Hi Alex,
>> thanks for your answer.
>> 
>> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
>> 
>>  
>>
>>  add-unknown-fields-to-the-schema
>>
>>  
>> 
>> I created my core using this command:
>> 
>> curl 
>> http://192.168.99.100:8999/solr/admin/cores?action=CREATE=solrexchange=/opt/solr/server/solr/solrexchange=data_driven_schema_configs_custom
>> 
>> I am using the example configset data_driven_schema_configs and I simply 
>> added:
>> 
>>  > regex="solr-dataimporthandler-.*\.jar" />
>>  
>>  
>>data-config.xml
>>  
>>  
>> 
>> I thought the schemaless mode was enable by default but I also tried adding 
>> this config but I get the same result.
>> 
>>  
>>true
>>managed-schema
>>  
>> 
>> How can I update my schemaless URP chain and add the parameter to call it to 
>> DIH?
>> 
>> 
>>> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch  wrote:
>>> 
>>> Do you have the actual fields defined? If not, then I am guessing that
>>> your 'post' test was against a different collection that had
>>> schemaless mode enabled and your DIH one is against one where
>>> schemaless mode is not enabled (look for
>>> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
>>> Solr examples for DIH do not have schemaless mode enabled.
>>> 
>>> I _believe_ you can copy the schemaless URP chain and add the
>>> parameter to call it to DIH handler and it _should_ work. But I am not
>>> betting on it without testing it, as DIH also has some magic code to
>>> ignore fields not defined in schema because it is designed to work
>>> with only extracting relevant fields from the database even with
>>> 'select *' statement.
>>> 
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>>> 
>>> 
>>> On 10 August 2016 at 17:12, Pierre Caserta  wrote:
 Hi,
 It seems that using the DataImportHandler with a XPathEntityProcessor 
 config
 with a managed-schema setup, only import the id and version field.
 
 data-config.xml
 
 
   
   
   >>>   processor="XPathEntityProcessor"
   stream="true"
   forEach="/posts/row/"
   url="${dataimporter.request.dataurl}"
 
 transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
> 
   
   
   >>> xpath="/posts/row/@AcceptedAnswerId" />
   >>> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
   
   
   >>> />
   
   >>> xpath="/posts/row/@LastEditorUserId" />
   >>> xpath="/posts/row/@LastEditorDisplayName" />
   >>> xpath="/posts/row/@LastActivityDate"
 dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
   
   >>> regex="(.*)" />
   >>> splitBy="" />
   
   >>> />
   >>> />
   >>> xpath="/posts/row/@CommunityOwnedDate"
 dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
   
   
 
 
 
 http://192.168.99.100:8999/solr/solrexchange/select?indent=on=*:*=json
 {
 "responseHeader":{
   "status":0,
   "QTime":0,
   "params":{
 "q":"*:*",
 "indent":"on",
 "wt":"json",
 "_":"1470811193595"}},
 "response":{"numFound":8,"start":0,"docs":[
 {
   "id":"38822",
   "_version_":1542258196375142400},
 {
   "id":"38836",
   "_version_":1542258196387725312},
 {
   "id":"63896",
   "_version_":1542258196388773888},
 {
   "id":"65406",
   "_version_":1542258196391919616},
 {
   "id":"1357173",
   "_version_":1542258196391919617},
 {
   "id":"5339763",
   "_version_":1542258196392968192},
 {
   "id":"9932722",
   

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Your initParams section does not apply to /dataimport handler as
defined. Try modifying it to say:
path="/update/**,/dataimport"

Hopefully, that's all that takes.

Managed schema is enabled by default, but schemaless mode is the next
layer on top. With managed schema, you can use the API to add your
fields (or new Admin UI in the Schema screen). With schemaless mode,
it tries to guess the field type as it adds it automatically.


Regards,
Alex.


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 August 2016 at 18:04, Pierre Caserta  wrote:
> Hi Alex,
> thanks for your answer.
>
> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
>
>   
> 
>   add-unknown-fields-to-the-schema
> 
>   
>
> I created my core using this command:
>
> curl 
> http://192.168.99.100:8999/solr/admin/cores?action=CREATE=solrexchange=/opt/solr/server/solr/solrexchange=data_driven_schema_configs_custom
>
> I am using the example configset data_driven_schema_configs and I simply 
> added:
>
>regex="solr-dataimporthandler-.*\.jar" />
>   
>   
> data-config.xml
>   
>   
>
> I thought the schemaless mode was enable by default but I also tried adding 
> this config but I get the same result.
>
>   
> true
> managed-schema
>   
>
> How can I update my schemaless URP chain and add the parameter to call it to 
> DIH?
>
>
>> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch  wrote:
>>
>> Do you have the actual fields defined? If not, then I am guessing that
>> your 'post' test was against a different collection that had
>> schemaless mode enabled and your DIH one is against one where
>> schemaless mode is not enabled (look for
>> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
>> Solr examples for DIH do not have schemaless mode enabled.
>>
>> I _believe_ you can copy the schemaless URP chain and add the
>> parameter to call it to DIH handler and it _should_ work. But I am not
>> betting on it without testing it, as DIH also has some magic code to
>> ignore fields not defined in schema because it is designed to work
>> with only extracting relevant fields from the database even with
>> 'select *' statement.
>>
>>
>> Regards,
>>   Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 10 August 2016 at 17:12, Pierre Caserta  wrote:
>>> Hi,
>>> It seems that using the DataImportHandler with a XPathEntityProcessor config
>>> with a managed-schema setup, only import the id and version field.
>>>
>>> data-config.xml
>>>
>>> 
>>>
>>>
>>>>>processor="XPathEntityProcessor"
>>>stream="true"
>>>forEach="/posts/row/"
>>>url="${dataimporter.request.dataurl}"
>>>
>>> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"

>>>
>>>
>>>>> xpath="/posts/row/@AcceptedAnswerId" />
>>>>> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>>>
>>>
>>>>> />
>>>
>>>>> xpath="/posts/row/@LastEditorUserId" />
>>>>> xpath="/posts/row/@LastEditorDisplayName" />
>>>>> xpath="/posts/row/@LastActivityDate"
>>> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>>>
>>>>> regex="(.*)" />
>>>>> splitBy="" />
>>>
>>>>> />
>>>>> />
>>>>> xpath="/posts/row/@CommunityOwnedDate"
>>> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>>>
>>>
>>> 
>>>
>>>
>>> http://192.168.99.100:8999/solr/solrexchange/select?indent=on=*:*=json
>>> {
>>>  "responseHeader":{
>>>"status":0,
>>>"QTime":0,
>>>"params":{
>>>  "q":"*:*",
>>>  "indent":"on",
>>>  "wt":"json",
>>>  "_":"1470811193595"}},
>>>  "response":{"numFound":8,"start":0,"docs":[
>>>  {
>>>"id":"38822",
>>>"_version_":1542258196375142400},
>>>  {
>>>"id":"38836",
>>>"_version_":1542258196387725312},
>>>  {
>>>"id":"63896",
>>>"_version_":1542258196388773888},
>>>  {
>>>"id":"65406",
>>>"_version_":1542258196391919616},
>>>  {
>>>"id":"1357173",
>>>"_version_":1542258196391919617},
>>>  {
>>>"id":"5339763",
>>>"_version_":1542258196392968192},
>>>  {
>>>"id":"9932722",
>>>"_version_":1542258196392968193},
>>>  {
>>>"id":"9217299",
>>>"_version_":1542258196392968194}]
>>>  }}
>>>
>>> data_search.xml (8 rows)
>>>
>>>
>>>
>>> the url I am hitting (with custom dataurl parameter)
>>>
>>> curl
>>> 'http://192.168.99.100:8999/solr/solrexchange/dataimport?command=full-import=true=/code/solr/data/search/dih/data_search.xml'
>>>
>>> I changed my data to useand use the bin/post tool and
>>> 

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
Hi Alex,
thanks for your answer.

Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.

  

  add-unknown-fields-to-the-schema

  

I created my core using this command:

curl 
http://192.168.99.100:8999/solr/admin/cores?action=CREATE=solrexchange=/opt/solr/server/solr/solrexchange=data_driven_schema_configs_custom

I am using the example configset data_driven_schema_configs and I simply added:

  
  
  
data-config.xml
  
  

I thought the schemaless mode was enable by default but I also tried adding 
this config but I get the same result.

  
true
managed-schema
  

How can I update my schemaless URP chain and add the parameter to call it to 
DIH?


> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch  wrote:
> 
> Do you have the actual fields defined? If not, then I am guessing that
> your 'post' test was against a different collection that had
> schemaless mode enabled and your DIH one is against one where
> schemaless mode is not enabled (look for
> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
> Solr examples for DIH do not have schemaless mode enabled.
> 
> I _believe_ you can copy the schemaless URP chain and add the
> parameter to call it to DIH handler and it _should_ work. But I am not
> betting on it without testing it, as DIH also has some magic code to
> ignore fields not defined in schema because it is designed to work
> with only extracting relevant fields from the database even with
> 'select *' statement.
> 
> 
> Regards,
>   Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 10 August 2016 at 17:12, Pierre Caserta  wrote:
>> Hi,
>> It seems that using the DataImportHandler with a XPathEntityProcessor config
>> with a managed-schema setup, only import the id and version field.
>> 
>> data-config.xml
>> 
>> 
>>
>>
>>>processor="XPathEntityProcessor"
>>stream="true"
>>forEach="/posts/row/"
>>url="${dataimporter.request.dataurl}"
>> 
>> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
>>> 
>>
>>
>>> xpath="/posts/row/@AcceptedAnswerId" />
>>> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>>
>>
>>> />
>>
>>> xpath="/posts/row/@LastEditorUserId" />
>>> xpath="/posts/row/@LastEditorDisplayName" />
>>> xpath="/posts/row/@LastActivityDate"
>> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>>
>>> regex="(.*)" />
>>> splitBy="" />
>>
>>> />
>>> />
>>> xpath="/posts/row/@CommunityOwnedDate"
>> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
>>
>>
>> 
>> 
>> 
>> http://192.168.99.100:8999/solr/solrexchange/select?indent=on=*:*=json
>> {
>>  "responseHeader":{
>>"status":0,
>>"QTime":0,
>>"params":{
>>  "q":"*:*",
>>  "indent":"on",
>>  "wt":"json",
>>  "_":"1470811193595"}},
>>  "response":{"numFound":8,"start":0,"docs":[
>>  {
>>"id":"38822",
>>"_version_":1542258196375142400},
>>  {
>>"id":"38836",
>>"_version_":1542258196387725312},
>>  {
>>"id":"63896",
>>"_version_":1542258196388773888},
>>  {
>>"id":"65406",
>>"_version_":1542258196391919616},
>>  {
>>"id":"1357173",
>>"_version_":1542258196391919617},
>>  {
>>"id":"5339763",
>>"_version_":1542258196392968192},
>>  {
>>"id":"9932722",
>>"_version_":1542258196392968193},
>>  {
>>"id":"9217299",
>>"_version_":1542258196392968194}]
>>  }}
>> 
>> data_search.xml (8 rows)
>> 
>> 
>> 
>> the url I am hitting (with custom dataurl parameter)
>> 
>> curl
>> 'http://192.168.99.100:8999/solr/solrexchange/dataimport?command=full-import=true=/code/solr/data/search/dih/data_search.xml'
>> 
>> I changed my data to useand use the bin/post tool and
>> this is working as expected.
>> Now I am interested to make it work with the DataImportHandler.
>> How can I use the DataImportHandler to import my document ?
>> 
>> Thanks,
>> Pierre Caserta
>> 
>> 



Typical Error with live SOLR nodes

2016-08-10 Thread Ritesh Kumar (Avanade)
Hi All,

We have 3 ZK VM's and 3 Solr VM's with SOLR 6 and we have implemented CDCR. 
(windows) A dedicated drive has been setup for SOLR & ZK separately.

The issue we are facing is 2 nodes are showing together and 1 node separately 
in the same external zookeeper. Please note that restarting ZK windows services 
or restarting VM's temporarily fixes the issue.

Please find the solr status details from 2 SOLR VM's:

F:\solr-6.1.0\bin>solr status

Found Solr process 3612 running on port 8984
{
  "solr_home":"F:\\solr-6.1.0\\server\\solr",
  "version":"6.1.0 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 - jpountz - 2016-06-
13 09:46:58",
  "startTime":"2016-08-03T17:50:08.928Z",
  "uptime":"0 days, 0 hours, 9 minutes, 3 seconds",
  "memory":"150.2 MB (%3.8) of 3.8 GB",
  "cloud":{
"ZooKeeper":"ZKHostIP:2181, ZKHostIP:2182, ZKHostIP:2183",
"liveNodes":"2",
"collections":"0"}}


F:\solr-6.1.0\bin>solr status

Found Solr process 5100 running on port 8983
{
  "solr_home":"F:\\solr-6.1.0\\server\\solr",
  "version":"6.1.0 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 - jpountz - 2016-06-
13 09:46:58",
  "startTime":"2016-08-03T17:49:28.868Z",
  "uptime":"0 days, 0 hours, 5 minutes, 9 seconds",
  "memory":"156.9 MB (%4) of 3.8 GB",
  "cloud":{
"ZooKeeper":" ZKHostIP:2181, ZKHostIP:2182, ZKHostIP:2183",
"liveNodes":"1",
"collections":"0"}}

[cid:image001.png@01D1213F.8C2C3A20]

Ritesh K
Infrastructure Sr. Engineer - Jericho Team
Sales & Marketing Digital Services
t +91-7799936921   v-kur...@microsoft.com





Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Do you have the actual fields defined? If not, then I am guessing that
your 'post' test was against a different collection that had
schemaless mode enabled and your DIH one is against one where
schemaless mode is not enabled (look for
'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
Solr examples for DIH do not have schemaless mode enabled.

I _believe_ you can copy the schemaless URP chain and add the
parameter to call it to DIH handler and it _should_ work. But I am not
betting on it without testing it, as DIH also has some magic code to
ignore fields not defined in schema because it is designed to work
with only extracting relevant fields from the database even with
'select *' statement.


Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 August 2016 at 17:12, Pierre Caserta  wrote:
> Hi,
> It seems that using the DataImportHandler with a XPathEntityProcessor config
> with a managed-schema setup, only import the id and version field.
>
> data-config.xml
>
> 
> 
> 
>  processor="XPathEntityProcessor"
> stream="true"
> forEach="/posts/row/"
> url="${dataimporter.request.dataurl}"
>
> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
> >
> 
> 
>  xpath="/posts/row/@AcceptedAnswerId" />
>  dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
> 
> 
>  />
> 
>  xpath="/posts/row/@LastEditorUserId" />
>  xpath="/posts/row/@LastEditorDisplayName" />
>  xpath="/posts/row/@LastActivityDate"
> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
> 
>  regex="(.*)" />
>  splitBy="" />
> 
>  />
>  />
>  xpath="/posts/row/@CommunityOwnedDate"
> dateTimeFormat="-MM-dd'T'hh:mm:ss.SSS" />
> 
> 
> 
>
>
> http://192.168.99.100:8999/solr/solrexchange/select?indent=on=*:*=json
> {
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"*:*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1470811193595"}},
>   "response":{"numFound":8,"start":0,"docs":[
>   {
> "id":"38822",
> "_version_":1542258196375142400},
>   {
> "id":"38836",
> "_version_":1542258196387725312},
>   {
> "id":"63896",
> "_version_":1542258196388773888},
>   {
> "id":"65406",
> "_version_":1542258196391919616},
>   {
> "id":"1357173",
> "_version_":1542258196391919617},
>   {
> "id":"5339763",
> "_version_":1542258196392968192},
>   {
> "id":"9932722",
> "_version_":1542258196392968193},
>   {
> "id":"9217299",
> "_version_":1542258196392968194}]
>   }}
>
> data_search.xml (8 rows)
>
>
>
> the url I am hitting (with custom dataurl parameter)
>
> curl
> 'http://192.168.99.100:8999/solr/solrexchange/dataimport?command=full-import=true=/code/solr/data/search/dih/data_search.xml'
>
> I changed my data to useand use the bin/post tool and
> this is working as expected.
> Now I am interested to make it work with the DataImportHandler.
> How can I use the DataImportHandler to import my document ?
>
> Thanks,
> Pierre Caserta
>
>


display filter based on existence of facet

2016-08-10 Thread Derek Poh
I have a couple of filtersthat is text input based, where user will 
input a value into the text boxes of these filters.
The condition is these filters will only be display if the facets exists 
in the search result.
Eg. Min Order Qty filter will be displayif theMin Order Qty facet exists 
in thesolr result.


To display this filter, I only need to'know' there is value to filter on.
Currentlyall the possible terms and counts of the Min Order Qty field is 
return for this facet.


Any suggestions on how I can avoid the computation of the possible terms 
and their countsfor the facet fieldand hence reduce the computational 
time of the query?

I just need to know there is'a value to filter on'.

This is the parameters of the query that is use to display the list of 
filters.

group.field=P_SupplierId=true=true=0=0=coffee=P_SupplierSource:(1)=true=1=P_CNState=P_BusinessType=P_CombinedBusTypeFlat=P_CombinedCompCertFlat=P_CombinedExportCountryFlat=P_CombinedProdCertFlat=P_Country=P_CSFParticipant=P_FOBPriceMinFlag=P_FOBPriceMaxFlag=P_HasAuditInfo=P_HasCreditInfo=P_LeadTime=P_Microsite=P_MinOrderQty=P_MonthlyCapacityFlag=P_OEMServices=P_PSEParticipant=P_SupplierRanking=P_SupplierUpcomingTradeShow=P_YearsInBusiness=P_SmallOrderFlag

Using solr 4.10.4

Thankyou,
Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
Hi,It seems that using the DataImportHandler with a XPathEntityProcessor config with a managed-schema setup, only import the id and version field.data-config.xml                            processor="XPathEntityProcessor"            stream="true"            forEach="/posts/row/"            url=""            transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"            >                                                                                                                                                                                                                                    http://192.168.99.100:8999/solr/solrexchange/select?indent=on=*:*=json{  "responseHeader":{    "status":0,    "QTime":0,    "params":{      "q":"*:*",      "indent":"on",      "wt":"json",      "_":"1470811193595"}},  "response":{"numFound":8,"start":0,"docs":[      {        "id":"38822",        "_version_":1542258196375142400},      {        "id":"38836",        "_version_":1542258196387725312},      {        "id":"63896",        "_version_":1542258196388773888},      {        "id":"65406",        "_version_":1542258196391919616},      {        "id":"1357173",        "_version_":1542258196391919617},      {        "id":"5339763",        "_version_":1542258196392968192},      {        "id":"9932722",        "_version_":1542258196392968193},      {        "id":"9217299",        "_version_":1542258196392968194}]  }} data_search.xml (8 rows)

data_search.xml
Description: XML document
the url I am hitting (with custom dataurl parameter)curl 'http://192.168.99.100:8999/solr/solrexchange/dataimport?command=full-import=true=/code/solr/data/search/dih/data_search.xml'I changed my data to useand use the bin/post tool and this is working as expected.Now I am interested to make it work with the DataImportHandler.How can I use the DataImportHandler to import my document ? Thanks,Pierre Caserta

Re: Getting dynamic fields using LukeRequest.

2016-08-10 Thread Pranaya Behera
And also when I hit the request for each individual shard I get some 
results that are close to it using /admin/luke endpoint but to the whole 
collection it doesnt even show that have dynamic fields.


On 10/08/16 11:23, Pranaya Behera wrote:

Hi Steve,
  I did look at the schema api but it only gives the 
defined dynamic fields not the indexed dynamic fields. For indexed 
fields with the rule of the defined dynamic field I guess LukeRequest 
is the only option. (Please correct me if I am wrong.)


Hence I am unable to fetch each and every indexed field with the 
defined dynamic field.


On 09/08/16 19:26, Steve Rowe wrote:
Not sure what the issue is with LukeRequest, but Solrj has Schema API 
support: 



You can see which options are supported here: 



--
Steve
www.lucidworks.com

On Aug 9, 2016, at 8:52 AM, Pranaya Behera  
wrote:


Hi,
 I have the following script to retrieve all the fields in the 
collection. I am using SolrCloud 6.1.0.

LukeRequest lukeRequest = new LukeRequest();
lukeRequest.setNumTerms(0);
lukeRequest.setShowSchema(false);
LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
Map fieldInfoMap = 
lukeResponse.getFieldInfo();
for (Map.Entry entry : 
fieldInfoMap.entrySet()) {
  entry.getKey(); // Here fieldInfoMap is size of 0 for sometime and 
sometime it is getting incomplete data.

}


Setting showSchema to true doesn't yield any result. Only making it 
false yields result that too incomplete data. As I can see in the 
doc that it has more than what it is saying it has.


LukeRequest hits 
/solr/product/admin/luke?numTerms=0=javabin=2 HTTP/1.1 .


How it should be configured for solrcloud ?
I have already mentioned

class="org.apache.solr.handler.admin.LukeRequestHandler" />


in the solrconfig.xml. It doesn't matter whether it is present in 
the solrconfig or not as I am requesting it from solrj.