| AJF added a comment. |
@Lucas_Werkmeister_WMDE TSV download format. Opened in Notepadd++, and encoding is detailed as ANSI when it has always been UTF-8. Some of the encoding issues are causing new lines in the download, corrupting both the entities and the file, for example (sample included from example query linked previously):
http://www.wikidata.org/entity/Q3157864 Jacques-Antoine-Marie Lemoine 3 Jacques-antoine-marie lemoine
http://www.wikidata.org/entity/Q3157864 Jacques-Antoine-Marie Lemoine 3 Jacques-Antoine-Marie Lemoyne
http://www.wikidata.org/entity/Q3161723 Jan Ml
och 3 Jan Mlcoch
http://www.wikidata.org/entity/Q1964408 Nan Hoover 6 Nancy Dodge Browne
This is even prior to converting back to UTF-8.
TASK DETAIL
EMAIL PREFERENCES
To: Smalyshev, AJF
Cc: AJF, gerritbot, Jay8g, TerraCodes, Esc3300, Stashbot, Gehel, Mbch331, Smalyshev, VIGNERON, Lea_Lacroix_WMDE, Lucas_Werkmeister_WMDE, Jonas, Ash_Crow, abian, Aklapper, Lahi, Gq86, Baloch007, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, Avner, Lewizho99, Maathavan, debt, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
Cc: AJF, gerritbot, Jay8g, TerraCodes, Esc3300, Stashbot, Gehel, Mbch331, Smalyshev, VIGNERON, Lea_Lacroix_WMDE, Lucas_Werkmeister_WMDE, Jonas, Ash_Crow, abian, Aklapper, Lahi, Gq86, Baloch007, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, Avner, Lewizho99, Maathavan, debt, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
