Grouping and sorting Together

2019-11-14 Thread neotorand
Hi List
I need your help to resolve a problem for which i had been struggling for
days.
Lets take an example of Shoes which are grouped on basis of size and Price

With first group as size and price as "7 and 7000" i have 2 documents as
below

{id:1,color:blue,item sold:10}
{id:5,price:yellow,item sold:1}


with second group as size and price as "8 and 8000"  i have 2 documents as
below

{id:2,color:blue,item sold:3}
{id:3,price:yellow,item sold:5}

Now i want to sort the records based on item sold.
How I should look at  the problem.should i remove grouping and sort result
and show.I m asking this as u can see first group has item with item sold as
10,1 and second group as 3 and 5.
What approach i should have to look at the problem

Regards
Neo







--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Unbale to Create a Core

2018-07-06 Thread neotorand
Hi List,
I am unable to create a core.Unable to figure out what wrong.
I get below error.

ERROR: Failed to create collection 'XXX' due to: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
from server at 
http://xyz.com:8983/solr: 
Error CREATEing SolrCore 'docpocc_shard1_replica1': 
Unable to create core [docpocc_shard1_replica1] Caused by: Missing required
init param 'defaultFieldType' 

in my solr config file i have the init param as below

  

  _text_

  

Any help or pointers.Thanks in advance.


Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing part of Binary Documents and not the entire contents

2018-07-06 Thread neotorand
Gus
You are never biased.
I explored a bit about JesterJ. Looks quite promising.
I will keep you posted on my experience to you soon.

Regards
Neo




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing part of Binary Documents and not the entire contents

2018-06-27 Thread neotorand
Thanks Erick
I already have gone through the link from tika example you shared.
Please look at the code in bold.
I believe still the entire contents is pushed to memory with handler object.
sorry i copied lengthy code from tika site.

Regards
Neo

*Streaming the plain text in chunks*
Sometimes, you want to chunk the resulting text up, perhaps to output as you
go minimising memory use, perhaps to output to HDFS files, or any other
reason! With a small custom content handler, you can do that.

public List parseToPlainTextChunks() throws IOException,
SAXException, TikaException {
final List chunks = new ArrayList<>();
chunks.add("");
ContentHandlerDecorator handler = new ContentHandlerDecorator() {
@Override
public void characters(char[] ch, int start, int length) {
String lastChunk = chunks.get(chunks.size() - 1);
String thisStr = new String(ch, start, length);
 
if (lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
chunks.add(thisStr);
} else {
chunks.set(chunks.size() - 1, lastChunk + thisStr);
}
}
};
 
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
try (InputStream stream =
ContentHandlerExample.class.getResourceAsStream("test2.doc")) {
*parser.parse(stream, handler, metadata);*
return chunks;
}
}



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Shawn,

Yes I agree ERH is never suggested in production.
I am writing my custom ones.
Any pointer with this?

What exactly i am looking is a custom indexing program to compile precisely
the information 
that you need and send that to Solr.
On the other hand i see the below method is very expensive if document size
is large.
 autoParser.parse(input, textHandler, metadata, context);

Because ContentHandler would hold the entire contents in memory.
Any suggestions?

Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Erick,

Though i saw this article in several places but never went through it
seriously.

Dont you think the below method is very exepensive

autoParser.parse(input, textHandler, metadata, context);


If the document size if bigger than it will need enough memory to hold the
document(ie ContentHandler).
Any other alternative?

Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Indexing part of Binary Documents and not the entire contents

2018-06-20 Thread neotorand
Hi List,
I have a specific Requirement where i need to index below things

Meta Data of any document
Some parts from the Document that matches some keywords that i configure

The first part i am able to achieve through ERH or FilelistEntityProcessor.

I am struggling on second part.I am looking for an effective and smart
approach to handle this.
Can any one give me a pointer or help with this.

Thanks in adavance!


Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Decision on Number of shards and collection

2018-04-13 Thread neotorand
Hi Shawn,
Thanks for the long explanation.
Now 2 Billion limit can be overcome by using shard.

Now coming back to collection.Unless we have  a logical or Business reason
we should not go for more than one collection.

Lets say i have 5 different entities and they have each 10,20,30,40 and 50
attributes(Columns) to be indexed/stored.
Now if i store them in single collection.is there any ways empty spaces
being created.
On other way if i store heterogeneous data items in a single collection,
Does by any means there is a poor utilization of memory by creation of empty
holes.

What are the pros and cons of single vs Multiple.

Thanks team for spending your valuable time to clarify.

Regards
Neo





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Decision on Number of shards and collection

2018-04-12 Thread neotorand
Emir
I read from the link you shared that 
"Shard cannot contain more than 2 billion documents since Lucene is using
integer for internal IDs."

In which java class of SOLR implimentaion repository this can be found.

Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Decision on Number of shards and collection

2018-04-12 Thread neotorand
Thanks every one for your beautifull explanation and valuable time.

Thanks Emir for the Nice
Link(http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html)
Thanks Shawn for
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

When should we have more collection?

We have a business reason to keep them in separate collection
we dont need to query all data at once

When should we have more shards?
Define Latency
Go on adding document to shards till you have acceptable Latency.That will
define the shards size(SS)
Get the size of all data to be indexed.(TS)
numshards = TS/SS

One quick question.
@Shawn
If i have data in more than one collection still i can query them at once.?
I think yes as i read from SOLR site.
What are pros and cons of single vs multiple collection?

I have gone through the estimating Memory and storage for SOLR from
Lucid.(https://lucidworks.com/2011/09/14/estimating-memory-and-storage-for-lucenesolr/)

@SOLR4189 i will go through the book and get back to you.Thanks.

Time is too short to explore the Long Lived Open source technology

Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing fails with partially done

2018-04-11 Thread neotorand
Thanks Emir
with context to DIH do we have any Resume mechanism?

Regards
Neo




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Decision on Number of shards and collection

2018-04-11 Thread neotorand
Hi Emir,
Thanks a lot for your reply.
so when i design a solr eco system i should start with some rough guess on
shards and increase the number of shards to make performance better.what is
the accepted/ideal Response Time.There should be a trade off between
Response time and the number of shards as data keeps growing.

I agree we split our index when response time increases.So what could be
that response time threshold or query Latency?

Thanks again!


Regards
priyadarshi





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Indexing fails with partially done

2018-04-11 Thread neotorand
with Solrcloud What happens if indexing is partially completed and ensemble 
goes down.What are the ways to Resume.In one of the scenario i am using 3 ZK 
Node in ensemble.Lets say i am indexing 5 million data and i have partially 
indexed the data and ZK ensemble goes down. What should be the best approach 
for handling such scenario 

Regards 
Neo 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Decision on Number of shards and collection

2018-04-11 Thread neotorand
Hi Team
First of all i take this opportunity to thank you all for creating a
beautiful place where people can explore ,learn and debate.

I have been on my knees for couple of days to decide on this.

When i am creating a solr cloud eco system i need to decide on number of
shards and collection.
What are the best practices for taking this decisions.

I believe heterogeneous data can be indexed to same collection and i can
have multiple shards for the index to be partitioned.So whats the need of a
second collection?. yes when collection size grows i should look for more
collection.what exactly that size is? what KPI drives the decision of having
more collection?Any pointers or links for best practice.

when should i go for multiple shards?
yes when shard size grows.Right? whats the size and how do i benchmark.

I am sorry for my question if its already asked but googled all the ecospace
quora,stackoverflow,lucid

Regards
Neo





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html