In Java: UUID.randomUUID();

That is what I'm using.

Regards

> On 21 Jul 2015, at 22:38, Vineeth Dasaraju <vineeth.ii...@gmail.com> wrote:
> 
> Hi Upayavira,
> 
> I guess that is the problem. I am currently using a function for generating
> an ID. It takes the current date and time to milliseconds and generates the
> id. This is the function.
> 
> public static String generateID(){
>        Date dNow = new Date();
>        SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs");
>        String datetime = ft.format(dNow);
>        return datetime;
>    }
> 
> 
> I believe that despite having a millisecond precision in the id generation,
> multiple objects are being assigned the same ID. Can you suggest a better
> way to generate the ID?
> 
> Regards,
> Vineeth
> 
> 
>> On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote:
>> 
>> Are you making sure that every document has a unique ID? Index into an
>> empty Solr, then look at your maxdocs vs numdocs. If they are different
>> (maxdocs is higher) then some of your documents have been deleted,
>> meaning some were overwritten.
>> 
>> That might be a place to look.
>> 
>> Upayavira
>> 
>>> On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote:
>>> I can confirm this behavior, seen when sending json docs in batch, never
>>> happens when sending one by one, but sporadic when sending batches.
>>> 
>>> Like if sole/jetty drops couple of documents out of the batch.
>>> 
>>> Regards
>>> 
>>>> On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com>
>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Thank You Erick for your inputs. I tried creating batches of 1000
>> objects
>>>> and indexing it to solr. The performance is way better than before but
>> I
>>>> find that number of indexed documents that is shown in the dashboard is
>>>> lesser than the number of documents that I had actually indexed through
>>>> solrj. My code is as follows:
>>>> 
>>>> private static String SOLR_SERVER_URL = "
>> http://localhost:8983/solr/newcore
>>>> ";
>>>> private static String JSON_FILE_PATH =
>> "/home/vineeth/week1_fixed.json";
>>>> private static JSONParser parser = new JSONParser();
>>>> private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL);
>>>> 
>>>> public static void main(String[] args) throws IOException,
>>>> SolrServerException, ParseException {
>>>>       File file = new File(JSON_FILE_PATH);
>>>>       Scanner scn=new Scanner(file,"UTF-8");
>>>>       JSONObject object;
>>>>       int i = 0;
>>>>       Collection<SolrInputDocument> batch = new
>>>> ArrayList<SolrInputDocument>();
>>>>       while(scn.hasNext()){
>>>>           object= (JSONObject) parser.parse(scn.nextLine());
>>>>           SolrInputDocument doc = indexJSON(object);
>>>>           batch.add(doc);
>>>>           if(i%1000==0){
>>>>               System.out.println("Indexed " + (i+1) + " objects." );
>>>>               solr.add(batch);
>>>>               batch = new ArrayList<SolrInputDocument>();
>>>>           }
>>>>           i++;
>>>>       }
>>>>       solr.add(batch);
>>>>       solr.commit();
>>>>       System.out.println("Indexed " + (i+1) + " objects." );
>>>> }
>>>> 
>>>> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
>>>> ParseException, IOException, SolrServerException {
>>>>   Collection<SolrInputDocument> batch = new
>>>> ArrayList<SolrInputDocument>();
>>>> 
>>>>   SolrInputDocument mainEvent = new SolrInputDocument();
>>>>   mainEvent.addField("id", generateID());
>>>>   mainEvent.addField("RawEventMessage",
>> jsonOBJ.get("RawEventMessage"));
>>>>   mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
>>>>   mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
>>>>   mainEvent.addField("EventMessageType",
>> jsonOBJ.get("EventMessageType"));
>>>>   mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
>>>>   mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));
>>>> 
>>>>   Object obj = parser.parse(jsonOBJ.get("User").toString());
>>>>   JSONObject userObj = (JSONObject) obj;
>>>> 
>>>>   SolrInputDocument childUserEvent = new SolrInputDocument();
>>>>   childUserEvent.addField("id", generateID());
>>>>   childUserEvent.addField("User", userObj.get("User"));
>>>> 
>>>>   obj = parser.parse(jsonOBJ.get("EventDescription").toString());
>>>>   JSONObject eventdescriptionObj = (JSONObject) obj;
>>>> 
>>>>   SolrInputDocument childEventDescEvent = new SolrInputDocument();
>>>>   childEventDescEvent.addField("id", generateID());
>>>>   childEventDescEvent.addField("EventApplicationName",
>>>> eventdescriptionObj.get("EventApplicationName"));
>>>>   childEventDescEvent.addField("Query",
>> eventdescriptionObj.get("Query"));
>>>> 
>>>>   obj=
>> JSONValue.parse(eventdescriptionObj.get("Information").toString());
>>>>   JSONArray informationArray = (JSONArray) obj;
>>>> 
>>>>   for(int i = 0; i<informationArray.size(); i++){
>>>>       JSONObject domain = (JSONObject) informationArray.get(i);
>>>> 
>>>>       SolrInputDocument domainDoc = new SolrInputDocument();
>>>>       domainDoc.addField("id", generateID());
>>>>       domainDoc.addField("domainName", domain.get("domainName"));
>>>> 
>>>>       String s = domain.get("columns").toString();
>>>>       obj= JSONValue.parse(s);
>>>>       JSONArray ColumnsArray = (JSONArray) obj;
>>>> 
>>>>       SolrInputDocument columnsDoc = new SolrInputDocument();
>>>>       columnsDoc.addField("id", generateID());
>>>> 
>>>>       for(int j = 0; j<ColumnsArray.size(); j++){
>>>>           JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
>>>>           SolrInputDocument columnDoc = new SolrInputDocument();
>>>>           columnDoc.addField("id", generateID());
>>>>           columnDoc.addField("movieName",
>> ColumnsObj.get("movieName"));
>>>>           columnsDoc.addChildDocument(columnDoc);
>>>>       }
>>>>       domainDoc.addChildDocument(columnsDoc);
>>>>       childEventDescEvent.addChildDocument(domainDoc);
>>>>   }
>>>> 
>>>>   mainEvent.addChildDocument(childEventDescEvent);
>>>>   mainEvent.addChildDocument(childUserEvent);
>>>>   return mainEvent;
>>>> }
>>>> 
>>>> I would be grateful if you could let me know what I am missing.
>>>> 
>>>> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson <
>> erickerick...@gmail.com>
>>>> wrote:
>>>> 
>>>>> First thing is it looks like you're only sending one document at a
>>>>> time, perhaps with child objects. This is not optimal at all. I
>>>>> usually batch my docs up in groups of 1,000, and there is anecdotal
>>>>> evidence that there may (depending on the docs) be some gains above
>>>>> that number. Gotta balance the batch size off against how bug the docs
>>>>> are of course.
>>>>> 
>>>>> Assuming that you really are calling this method for one doc (and
>>>>> children) at a time, the far bigger problem other than calling
>>>>> server.add for each parent/children is that you're then calling
>>>>> solr.commit() every time. This is an anti-pattern. Generally, let the
>>>>> autoCommit setting in solrconfig.xml handle the intermediate commits
>>>>> while the indexing program is running and only issue a commit at the
>>>>> very end of the job if at all.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju
>>>>> <vineeth.ii...@gmail.com> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I am trying to index JSON objects (which contain nested JSON objects
>> and
>>>>>> Arrays in them) into solr.
>>>>>> 
>>>>>> My JSON Object looks like the following (This is fake data that I am
>>>>> using
>>>>>> for this example):
>>>>>> 
>>>>>> {
>>>>>>   "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur
>>>>> adipiscing
>>>>>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt
>> consectetur
>>>>>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio
>>>>> iaculis.
>>>>>> Donec fringilla diam at placerat interdum. Proin vitae arcu non augue
>>>>>> facilisis auctor id non neque. Integer non nibh sit amet justo
>> facilisis
>>>>>> semper a vel ligula. Pellentesque commodo vulputate consequat. ",
>>>>>>   "EventUid": "1279706565",
>>>>>>   "TimeOfEvent": "2015-05-01-08-07-13",
>>>>>>   "TimeOfEventUTC": "2015-05-01-01-07-13",
>>>>>>   "EventCollector": "kafka",
>>>>>>   "EventMessageType": "kafka-@column",
>>>>>>   "User": {
>>>>>>       "User": "Lorem ipsum",
>>>>>>       "UserGroup": "Manager",
>>>>>>       "Location": "consectetur adipiscing",
>>>>>>       "Department": "Legal"
>>>>>>   },
>>>>>>   "EventDescription": {
>>>>>>       "EventApplicationName": "",
>>>>>>       "Query": "SELECT * FROM MOVIES",
>>>>>>       "Information": [
>>>>>>           {
>>>>>>               "domainName": "English",
>>>>>>               "columns": [
>>>>>>                   {
>>>>>>                       "movieName": "Casablanca",
>>>>>>                       "duration": "154",
>>>>>>                   },
>>>>>>   {
>>>>>>                       "movieName": "Die Hard",
>>>>>>                       "duration": "127",
>>>>>>                   }
>>>>>>               ]
>>>>>>           },
>>>>>>           {
>>>>>>               "domainName": "Hindi",
>>>>>>               "columns": [
>>>>>>                   {
>>>>>>                       "movieName": "DDLJ",
>>>>>>                       "duration": "176",
>>>>>>                   }
>>>>>>               ]
>>>>>>           }
>>>>>>       ]
>>>>>>   }
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> My function for indexing the object is as follows:
>>>>>> 
>>>>>> public static void indexJSON(JSONObject jsonOBJ) throws
>> ParseException,
>>>>>> IOException, SolrServerException {
>>>>>>   Collection<SolrInputDocument> batch = new
>>>>>> ArrayList<SolrInputDocument>();
>>>>>> 
>>>>>>   SolrInputDocument mainEvent = new SolrInputDocument();
>>>>>>   mainEvent.addField("id", generateID());
>>>>>>   mainEvent.addField("RawEventMessage",
>>>>> jsonOBJ.get("RawEventMessage"));
>>>>>>   mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
>>>>>>   mainEvent.addField("EventCollector",
>> jsonOBJ.get("EventCollector"));
>>>>>>   mainEvent.addField("EventMessageType",
>>>>> jsonOBJ.get("EventMessageType"));
>>>>>>   mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
>>>>>>   mainEvent.addField("TimeOfEventUTC",
>> jsonOBJ.get("TimeOfEventUTC"));
>>>>>> 
>>>>>>   Object obj = parser.parse(jsonOBJ.get("User").toString());
>>>>>>   JSONObject userObj = (JSONObject) obj;
>>>>>> 
>>>>>>   SolrInputDocument childUserEvent = new SolrInputDocument();
>>>>>>   childUserEvent.addField("id", generateID());
>>>>>>   childUserEvent.addField("User", userObj.get("User"));
>>>>>> 
>>>>>>   obj = parser.parse(jsonOBJ.get("EventDescription").toString());
>>>>>>   JSONObject eventdescriptionObj = (JSONObject) obj;
>>>>>> 
>>>>>>   SolrInputDocument childEventDescEvent = new SolrInputDocument();
>>>>>>   childEventDescEvent.addField("id", generateID());
>>>>>>   childEventDescEvent.addField("EventApplicationName",
>>>>>> eventdescriptionObj.get("EventApplicationName"));
>>>>>>   childEventDescEvent.addField("Query",
>>>>> eventdescriptionObj.get("Query"));
>>>>>> 
>>>>>>   obj=
>>>>> JSONValue.parse(eventdescriptionObj.get("Information").toString());
>>>>>>   JSONArray informationArray = (JSONArray) obj;
>>>>>> 
>>>>>>   for(int i = 0; i<informationArray.size(); i++){
>>>>>>       JSONObject domain = (JSONObject) informationArray.get(i);
>>>>>> 
>>>>>>       SolrInputDocument domainDoc = new SolrInputDocument();
>>>>>>       domainDoc.addField("id", generateID());
>>>>>>       domainDoc.addField("domainName", domain.get("domainName"));
>>>>>> 
>>>>>>       String s = domain.get("columns").toString();
>>>>>>       obj= JSONValue.parse(s);
>>>>>>       JSONArray ColumnsArray = (JSONArray) obj;
>>>>>> 
>>>>>>       SolrInputDocument columnsDoc = new SolrInputDocument();
>>>>>>       columnsDoc.addField("id", generateID());
>>>>>> 
>>>>>>       for(int j = 0; j<ColumnsArray.size(); j++){
>>>>>>           JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
>>>>>>           SolrInputDocument columnDoc = new SolrInputDocument();
>>>>>>           columnDoc.addField("id", generateID());
>>>>>>           columnDoc.addField("movieName",
>> ColumnsObj.get("movieName"));
>>>>>>           columnsDoc.addChildDocument(columnDoc);
>>>>>>       }
>>>>>>       domainDoc.addChildDocument(columnsDoc);
>>>>>>       childEventDescEvent.addChildDocument(domainDoc);
>>>>>>   }
>>>>>> 
>>>>>>   mainEvent.addChildDocument(childEventDescEvent);
>>>>>>   mainEvent.addChildDocument(childUserEvent);
>>>>>>   batch.add(mainEvent);
>>>>>>   solr.add(batch);
>>>>>>   solr.commit();
>>>>>> }
>>>>>> 
>>>>>> When I try to index the using the above code, I am able to index
>> only 12
>>>>>> Objects per second. Is there a faster way to do the indexing? I
>> believe I
>>>>>> am using the json-fast parser which is one of the fastest parsers for
>>>>> json.
>>>>>> 
>>>>>> Your help will be very valuable to me.
>>>>>> 
>>>>>> Thanks,
>>>>>> Vineeth
>> 

Reply via email to