[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2016-09-22 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated GORA-401:
--
Fix Version/s: (was: 0.8)
   0.7

> Serialization and deserialization of Persistent does not hold the entity 
> dirty state from Map to Reduce
> ---
>
> Key: GORA-401
> URL: https://issues.apache.org/jira/browse/GORA-401
> Project: Apache Gora
>  Issue Type: Bug
>  Components: gora-core
>Affects Versions: 0.4, 0.5
> Environment: Tested on gora-0.4, but seems logically to hold on 
> gora-0.5. HBase backend.
>Reporter: Alfonso Nishikawa
>Assignee: Kevin Ratnasekera
>Priority: Critical
>  Labels: serialization
> Fix For: 0.7
>
> Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
> GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch, GORA-401v5.patch
>
>   Original Estimate: 35h
>  Time Spent: 21h
>  Remaining Estimate: 0h
>
> After removing __g__dirty field in GORA-326, dirty field is not serialized. 
> In GORA-321 
> {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
>  went from using 
> {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
>  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
> field to Avro (but really not desirable to have that field as a main field in 
> the entities).
> The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
> will serialize the internal fields of the entities.
> This bug affects, for example, Nutch, which loads only some fields in it's 
> phases, serializes entities (from Map to Reduce), and when deserializes finds 
> all fields as "dirty", independently of what fields were modified in the Map, 
> and overwrite all data in datastore (deleting much things: downloaded 
> content, parsed content, etc).
> This effect can be seen in 
> {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
> {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
> shows that, entities are "equal" when it's fields are equal. This is fine as 
> "equal" definition, but another test must be added to check that 
> serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2016-09-22 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated GORA-401:
---
Remaining Estimate: 0h  (was: 14h)

> Serialization and deserialization of Persistent does not hold the entity 
> dirty state from Map to Reduce
> ---
>
> Key: GORA-401
> URL: https://issues.apache.org/jira/browse/GORA-401
> Project: Apache Gora
>  Issue Type: Bug
>  Components: gora-core
>Affects Versions: 0.4, 0.5
> Environment: Tested on gora-0.4, but seems logically to hold on 
> gora-0.5. HBase backend.
>Reporter: Alfonso Nishikawa
>Assignee: Kevin Ratnasekera
>Priority: Critical
>  Labels: serialization
> Fix For: 0.8
>
> Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
> GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch, GORA-401v5.patch
>
>   Original Estimate: 35h
>  Time Spent: 21h
>  Remaining Estimate: 0h
>
> After removing __g__dirty field in GORA-326, dirty field is not serialized. 
> In GORA-321 
> {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
>  went from using 
> {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
>  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
> field to Avro (but really not desirable to have that field as a main field in 
> the entities).
> The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
> will serialize the internal fields of the entities.
> This bug affects, for example, Nutch, which loads only some fields in it's 
> phases, serializes entities (from Map to Reduce), and when deserializes finds 
> all fields as "dirty", independently of what fields were modified in the Map, 
> and overwrite all data in datastore (deleting much things: downloaded 
> content, parsed content, etc).
> This effect can be seen in 
> {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
> {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
> shows that, entities are "equal" when it's fields are equal. This is fine as 
> "equal" definition, but another test must be added to check that 
> serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-03-05 Thread Henry Saputra (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Saputra updated GORA-401:
---
Assignee: Alfonso Nishikawa  (was: Henry Saputra)

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Assignee: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Fix For: 0.7

 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch, GORA-401v5.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-03-03 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated GORA-401:
---
Attachment: GORA-401v5.patch

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Assignee: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Fix For: 0.7

 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch, GORA-401v5.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-02-03 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated GORA-401:
--
Fix Version/s: (was: 0.6)
   0.7

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Assignee: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Fix For: 0.7

 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-02-02 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated GORA-401:
--
Assignee: Alfonso Nishikawa

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Assignee: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Fix For: 0.6

 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-02-01 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated GORA-401:
---
Attachment: GORA-401v4.patch

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Fix For: 0.6

 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-01-28 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated GORA-401:
--
Fix Version/s: 06

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Fix For: 0.6

 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-01-28 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated GORA-401:
--
Fix Version/s: (was: 06)
   0.6

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Fix For: 0.6

 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-01-22 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated GORA-401:
---
Attachment: GORA-401v2.patch

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Attachments: GORA-401-tests.patch, GORA-401v1.patch, GORA-401v2.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-01-22 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated GORA-401:
---
Attachment: GORA-401v3.patch

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
 GORA-401v2.patch, GORA-401v3.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-01-21 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated GORA-401:
---
Attachment: GORA-401v1.patch

Uploaded GORA-401v1.patch with a Quick And Dirty patch that works for HBase. 
The other datastores must be checked.

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Attachments: GORA-401-tests.patch, GORA-401v1.patch

   Original Estimate: 35h
  Time Spent: 21h
  Remaining Estimate: 14h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

2015-01-20 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated GORA-401:
---
Assignee: (was: Alfonso Nishikawa)

 Serialization and deserialization of Persistent does not hold the entity 
 dirty state from Map to Reduce
 ---

 Key: GORA-401
 URL: https://issues.apache.org/jira/browse/GORA-401
 Project: Apache Gora
  Issue Type: Bug
  Components: gora-core
Affects Versions: 0.4, 0.5
 Environment: Tested on gora-0.4, but seems logically to hold on 
 gora-0.5. HBase backend.
Reporter: Alfonso Nishikawa
Priority: Critical
  Labels: serialization
 Attachments: GORA-401-tests.patch

   Original Estimate: 35h
  Time Spent: 6.5h
  Remaining Estimate: 28.5h

 After removing __g__dirty field in GORA-326, dirty field is not serialized. 
 In GORA-321 
 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
  went from using 
 {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
 field to Avro (but really not desirable to have that field as a main field in 
 the entities).
 The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
 will serialize the internal fields of the entities.
 This bug affects, for example, Nutch, which loads only some fields in it's 
 phases, serializes entities (from Map to Reduce), and when deserializes finds 
 all fields as dirty, independently of what fields were modified in the Map, 
 and overwrite all data in datastore (deleting much things: downloaded 
 content, parsed content, etc).
 This effect can be seen in 
 {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
 {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
 shows that, entities are equal when it's fields are equal. This is fine as 
 equal definition, but another test must be added to check that 
 serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)